What it is Data Lineage, why you need it, and best practices
In this article, we provide definitions and practical recommendations to help you understand and implement modern data lineage.
What is Data Lineage?
Data lineage refers to the process of understanding and visualizing data flows from source to current location and tracking any alterations made to the data on its journey. This lets you know where any specific piece of data comes from, when and where it separated and merged with other data, and what transformations that have been applied to the field, from initial input to final application.
How Data Lineage Works
A modern data lineage tool gives you instant visibility into the source and journey of your data. You can see how the data lineage example below would give you confidence in your data and help you trace any errors back to the root cause.
Data lineage creates a data mapping framework by collecting and managing metadata from each step, and storing it in a metadata repository that can be used for lineage analysis. (Metadata is defined as “data describing other sets of data”.) For each process applied to data in its journey, the metadata is updated as shown in the simplified data lineage diagram below.
Benefits of Data Lineage
Your organization is likely flooded by large and complex datasets from many sources—financial systems, web analytics, ad platforms, CRM systems, marketing automation, partner data, and maybe even real time sources and IoT. So, knowing where your data is coming from and knowing you can trust it can be a major challenge.
The primary benefits of a robust data lineage process are that it allows you to do the following:
- Discover, track, and correct data process anomalies.
- Confidently migrate systems.
- Lower the cost of new IT development and application maintenance.
- Combine new datasets and existing datasets with an agile data infrastructure.
- Meet data governance goals and lower the cost of regulatory compliance
- Increase trust and reliance on data across your organization.
- Improve data analysis and thereby business performance.
Data lineage also provides “explainable BI” which is one of the top 10 BI and data trends this year.
Data Lineage Tool Features
Modern data lineage tools should make all your data transparent, trustworthy, and ready for analysis.
Below are the key capabilities of the best data lineage tools.
Visualization. You should be able to easily visualize how the data travels throughout its full journey, from the data source to the end-user application.
Data Catalog. The best tools allow you to search and explore all your data with the help of an integrated data catalog.
Reports. Make sure your data is structured according to your guidelines with formal reports.
Automated documentation. Your tool should generate system documentation automatically, collecting all the node comments, metadata, tables, fields, related files, and database statements for the chosen application into one single document.
Simple install. Modern data lineage tools are read-only and don’t interfere with any of your company data upon installation.
Customize and connect. You should be able to easily customize your environment to suit your business and its data. Plus, easily connect your lineage tool with your visualization tools, data warehouses, and cloud services.
Key Types of Data Lineage
Here are the main techniques used to perform data lineage:
- Backward data lineage means looking at the data from its end-use and back-dating it to its source.
- Forward data lineage begins at the source and follows through the end.
- End-to-end data lineage is the combination of the two, looking at the entire solution from the data’s source to its end-use.
Article source: Qlik blog.
For information about Qlik™, click here: qlik.com.
For specific and specialized solutions from QQinfo, click here: QQsolutions.
In order to be in touch with the latest news in the field, unique solutions explained, but also with our personal perspectives regarding the world of management, data and analytics, click here: QQblog !
