Unify your Data Strategy with Golden Records

Unify your Data Strategy with Golden Records

Article Data Platform Solutions Data & AI

In a previous article, we talked about the importance of good quality data, and specifically the need for unambiguous and uniform data that can be used as a single source of truth. Various examples from multiple industries showed that low quality, and especially duplicate data, can have a significant impact on both reporting and operations.

Golden Records and Data Unity

One way to solve this problem is through a concept called Golden Records; a consolidated and unified view of business entities. This concept takes shape in Eraneos Data Unity: a solution for unifying data from disparate sources using advanced machine learning techniques. Data Unity addresses the problem of low quality, potentially fragmented data across multiple sources.

In addition to data cleansing, Data Unity enables organizations to gain better insight into their available data by providing AI-powered similarity search. Combined with domain or business-specific rules, Data Unity uses this similarity search to identify equivalent entities and present the user with Golden Records: a globally unique list of entities with correct, up-to-date and consistent attributes. All of this functionality is designed with end-to-end lineage of all transformations and decisions in mind, enabling full traceability and explainability.

data unity at Eraneos

Eraneos Data Unity has four components, each with their own goals and advantages:

  1. Quality Records
  2. Similar Records
  3. Equivalent records
  4. Golden Records

1. Quality Records

The first component is the Quality Records component. While the need for trustworthy data is a subject deserving its own discussion, it is particularly important in light of Golden Records. On one hand, cleaning attributes that are used in identifying matching records can heavily improve results. On the other hand, cleaning (including standardizing, using reference data, inferring missing values, etc.) can make attribute survival and fusing records much more evident.

2. Similar Records

In the second component we enable similarity. Determining whether multiple records describe the same entity can be done in a variety of ways; it can be as simple as comparing (a combination of) IDs or key attributes, or it can be done using advanced AI-based techniques. Eraneos Data Unity harnesses the power of AI to not only include text-based attributes in determining similarity, but also to enable a semantic search rather than a search-based equality of attributes.
Instead of having to specify exact attributes, they can now focus on specific functionality, even if the attributes are potentially different. This allows business users to find entities that are similar in ways they can specify, rather than having to go through potentially tedious and time-consuming requests to IT departments.

3. Equivalent Records

The results of our similarity efforts allow us to find entities that are similar enough in some predefined way to be considered equal. In this sense, we consider Equal Records to be a specific subset of Similar Records. What makes our solution unique is that this dynamic approach allows for business- or domain-specific configurations; what is considered equal for one is not necessarily equal for another. So rather than going through a lengthy process of defining and fine-tuning matching algorithms, it is the business that determines what makes one entity equal to another. It is not so much the exact value of an attribute, but rather the functionality that the attribute value provides in a particular business context for its perceived similarity and hence equality. Again, without putting the burden on your IT department.

4. Golden Records

Once the clusters of records that can be confidently considered equal have been created, we can construct a Golden Record. To do this, each attribute must have its own attribute survival rules to determine which of the Equal records will provide the resulting value. This can be done in a number of ways, such as looking at the most recent value, a specific source, or the most common. Note that this process is heavily influenced by the results of the Quality component. Standardizing formats and using reference data are straightforward ways of reducing the number of different possibilities and making the choice easier. Once the surviving value has been determined for each of the attributes, the records are fused into the Golden Record.

Putting it all together

However, as with any analytics effort, be it basic BI or advanced AI/ML, in order to take full advantage of this newfound data asset, organizations need to understand what information they are using and how it was generated. Understandability and explainability are key to both usage and adoption. Eraneos Data Unity has been designed with this in mind and incorporates full lineage across all components to maximize this. In the Quality component, we provide insight into which columns do not meet the desired requirements, but we also provide full insight into any applicable changes during attribute cleansing. In the Similarity and Equivalence component, we provide confidence scores for each of the clusters to indicate why records are considered similar based on one or more attributes. Finally, in the Golden Record component, we provide insight into the fusion lineage by providing an overview of the decisions made during attribute survival.

Once the Golden Records have been created, they can be incorporated into various use cases of data-related activities. They can serve as master data to be used at various stages; we can use Golden Records to improve reporting and analysis as (part of) certified datasets, or incorporate them into earlier stages of ETL processing. Golden Records can even be used as input for data quality control and remediation efforts. In addition to analytical use, there is an important operational aspect to Golden Records: they can be incorporated into operational data stores for use in business processes. Whether it is Customer Relationship Management, Supply Chain Management, Healthcare, etc., any domain where a reliable and up-to-date unified view of data is detrimental to the success of business operations, Golden Records serve a purpose. Eraneos Data Unity is unique in that it allows users within the same organization to create their own domain-specific set of Golden Records. By reusing the results of similarity efforts and applying domain-specific equality and fusion rules, business users can create their own domain-specific Golden Records with minimal effort and delivery time.

Depending on the specific needs of the organization, each component of Eraneos Data Unity can make a difference. Whether the Data Quality component supports operational or analytical processes, trustworthy data is a must. Using the Similarity or Equality components enables a shift in the way data is used that was not possible with traditional methods of querying data and searching for information. Finally, our interpretation and implementation of Golden Records provides an agile and business case specific way to reduce IT involvement in creating the most reliable source of information.

Eraneos Data Unity

Ready to unlock the full potential of your data and elevate your decision-making processes? Contact us today to learn more about how Eraneos Data Unity and Golden Records can revolutionize the way your organization utilizes data. Let’s embark on a journey towards data-driven success together. Reach out now to schedule a consultation with our experts!

Stay up to date!

Are you enjoying this content? Sign up for our (Dutch) Newsletter to get highlighted insights written by our experts.

Jasper Stoop
By Jasper Stoop
Senior Data & AI Consultant

08 Apr 2024
Knowledge Hub overview