This article from Big Data Magazine, talks about how to manage the quality of information through Data Governance. All this, referring to DAMA framework and the important role that Aqtiva and Anjana Data play together to ensure a good Data Quality management.
Big Data Magazine
Lucía Engo Bermejo
10 March, 2022
According to the DAMA framework, a data quality management framework should include activities to:
Prioritize business needs
Identify critical data
Define rules and standards based on business requirements
Evaluate data against expectations
Share results with SMEs to detect problems
Prioritize and manage identified issues
Identify opportunities for improvement
Monitor quality and manage metadata
Anjana Data and Aqtiva form a suitable tandem thanks to the functionalities they provide that allow all these activities to be carried out and thus proactively manage data quality.
Capabilities to manage information quality
Quality management is important, but one must not lose sight of the fact that quality "is not free", which is why DAMA says that the quality program must focus on critical data. In this sense, Anjana Data's high configuration capabilities allow defining metadata attribute templates in which it is possible to identify critical data in the Datasets or in the Instances that process the information according to taxonomies specific to each organization.
Once the data on which the quality program will work has been identified, the next step is to define what quality is according to the business needs. Once again, Anjana Data's configuration capabilities are a differential in this sense, since they allow defining specific attributes in the dataset templates or dataset fields to define the quality requirements of the data they store (presence of values, lists of valid domains, data mapping, etc.).
Once the data quality requirements are known and defined, the next step is to define rules to objectively measure whether the data meet or do not meet these requirements. For this, rules are defined in terms of quality dimensions (accuracy, uniqueness, completeness, availability, validity...) In this aspect, Anjana Data is another differential since it is a very flexible tool that makes it possible to define a metamodel that supports quality management as it allows:
Create and maintain a glossary of business rules that provides the transparency that any quality management program needs because all users in the organization can have a user to know how data compliance is assessed.
Define in which systems data quality controls are established through data governance approvals, i.e., where the quality rules are executed while knowing the results of those controls. Do not forget that quality controls can penalize the performance of applications and systems, so they must be well governed (define the frequency of execution, window, number of assets affected, etc.) and have been previously approved.
Imagine the power provided by all of the above, being able to visualize in the lineage graph not only the quality controls, but also the rules applied to each control, always with the possibility of navigating to the details of both and redirecting the user directly to Aqtiva, thanks to the fact that both the metadata attribute templates of the quality rules and the controls defined in Anjana Data can be configured and synchronized via API with the metadata of Aqtiva Management Platform, a platform where the roles responsible for quality management implement all these rules and program controls in an interactive and simple way thanks to Aqtiva's visual interface.
Going deeper into the definition of these rules and controls, it should be noted that one of the differentials of Aqtiva is that it facilitates the definition of rules to non-technical users, without losing the possibility that more advanced users can get into the fine grain from the technical point of view.
Once the rules have been defined and the controls have been programmed, Aqtiva generates quality reports that, thanks to the synchronism mechanisms existing between both tools, can be cut by dimensions brought from Anjana Data. In the same way, in Anjana we can establish links to these quality reports so that users can navigate between both tools in a simple way and without hardly perceiving that they are changing tools.
In this way it is possible, for example, to have a link to the data quality reports generated by Aqtiva or to have these reports cut by dimensions brought from Anjana Data.
In order to evaluate whether or not the data complies with the defined rules, Aqtiva Management Platform provides a visual interface for the definition of quality policies, as well as for their design and implementation cycle. The technology on which Aqtiva is implemented allows an easy integration with data buses (such as Kafka), document databases or SQL databases, as well as blob technologies such as Azure Storage or S2, allowing quality controls on both data in transit and data at rest. Thanks to the native integration between Aqtiva and Anjana Data, the results of the checks are displayed in the metadata templates of Anjana entities thanks to APIs.
Aqtiva also has an analytics engine that takes data quality automation a step further. Based on historical data, this engine recommends optimal quality settings and provides dynamic quality rules based on behavioral data patterns, giving users rule recommendations so that they do not have to analyze the data sources themselves, Aqtiva analyzes them and recommends the optimal set of quality rules. The execution of these rules also allows for both horizontal and vertical scaling when running on a Big Data engine in Spark, minimizing the performance impact of rule execution.
Once controls have been executed to measure data compliance, it is essential to monitor them through dashboards to assess the impact of the quality program and act as soon as a problem occurs. In this sense, Aqtiva includes a customized dashboard and quality KPIs that provide real-time information on the quality of the data ingested into the client's system. It allows the breakdown and definition of customized quality ontologies, so it is possible to generate detailed KPIs. This information is provided in real time, allowing reactive detection of quality anomalies and early countermeasures. It also allows output connectors to be defined, allowing the information to be exported to clients' reporting systems to generate their own dashboards or reports.
On the other hand, Anjana Data under its no vendor lock-in philosophy offers access to its database so that the organization can develop its own dashboards based on all the metadata stored in Anjana Data. In addition, for all those organizations that do not have dashboarding tools, Anjana Data provides in its deployment Hue and Grafana for the organization to build their own custom dashboards.
Data Quality team journey
The previous section has detailed the activities purely related to quality management, from the identification of the data on which the program will focus, the definition of quality, the execution of controls and monitoring.
However, quality problems are often caused by a lack of understanding. It is very common to find yourself in a meeting in which a manager asks for a piece of information and, depending on who he asks, the information offered is different. This is due to a lack of a single definition of the data, which affects the calculation of the data, decision making, etc. For this reason, it is very useful to have a business glossary such as the one offered by Anjana Data where to provide a unique vocabulary to the entire organization with a stamp of approval by the data governance. This improves understanding, as well as data quality and confidence in the data.
On the other hand, as in data governance, it is extremely important to take a preventative approach as the goal of a quality program is not to measure and clean data but to improve quality and find more opportunities to derive value from the information. Fundamental to this is:
Define data entry controls (use of reference metadata, on-the-fly quality checks, etc.).
Proactive and preventive governance through data governance approvals prior to production, with impact controls, etc.
Promote a cultural change through which those responsible for the business processes that create the data become responsible for the quality of the data generated. On the other hand, IT must ensure quality and reserve budgets in developments so that they are accompanied by controls that monitor quality.
Find opportunities to improve quality and usability, for example, through data enrichment.
Thanks to Anjana Data and Aqtiva as a control execution engine, the organization has all the tools to properly manage quality in a preventive way. Within Anjana, the organization has a collaborative space where government, business and IT can collaborate to define controls, govern their assets proactively and improve the usability of information. In addition, Anjana Data offers the rest of the users full transparency on this management thanks to the workflows module, lineage and the data portal.
However, despite all of the above, quality issues arise and problems must be addressed. The problem solving model is a cyclical process that DAMA represents through the Shewhart Deming cycle.
In the Plan phase, the scope of the problems is analyzed (data profiling can be very useful for this purpose), the impact is analyzed and prioritized. Then, different alternatives for their resolution are evaluated.
In the Do phase, efforts are made to address the root causes (standardization, input controls...) and remedy the problems (data cleansing). In addition, controls for constant quality monitoring are implemented in this phase.
In the Check phase, the effects of the implemented measures are analyzed and the quality is monitored to ensure that it does not fall below the threshold again.
In the Act phase, action is taken again if an alarm is triggered because the quality falls below the threshold, if requirements change or if new data sets are analyzed.
Aqtiva is a great ally to perform all these activities. During the Plan phase, Aqtiva provides mechanisms for table discovery, data schemas and data profiling (data volume, number of nulls, mean, standard deviation, variance, out of range data, histogram, correlations, unique values or minimum/maximum/average values) with the differential of being able to make intelligent recommendations based on data analysis.
During the following phases, Aqtiva allows to plan the execution of quality measures in a way that automates the periodic validation of all data and rules. In addition, Aqtiva allows the definition of two-level warning thresholds that can be synchronized with Anjana Data, thus generating two types of warnings: error and warning. These thresholds can be defined as a maximum number of erroneous data, a percentage with respect to the total data and two actions can be defined: warnings via logs or email or the launching of an exception that can stop the execution flow (for example, to stop an ingest process in case the quality is not adequate).
As a final point to be taken into account when managing quality, it is essential for the success of the quality program that data consumers have access to all information related to incident tracking. For this, the flexibility of the templates and the native integration between Anjana Data and Aqtiva is also a differential, since any user can know if there are incidents associated with a certain data, navigate to the incident ticket, identify those responsible, know how quality is evolving, etc.