Understanding Data Governance and Data Quality
This morning, I read an interesting blog on the topic of “Data Governance in Silos? Bad Idea” in ITBusinessEdge. Intuitively, the idea of a corporate-wide standard for data governance and data quality makes sense. I also like the idea that it is both more effective and efficient when data quality is done in a standard fashion, by experts using common tools. Managing data in silos certainly seems inefficient, and likely to leave gaps in coverage.
The author, Loraine Lawson, links to an earlier post where she talks to the very important topic of ensuring the reliability of the data used to make decisions (the data used by business intelligence software, residing in enterprise applications and warehouses).
I wholeheartedly agree with her that before you embark on a business intelligence journey (or simply one of enhancing your analytics) you need confidence that the underlying data is complete, accurate, timely, and current. Enterprise information management (EIM) software can be of great assistance (SAP and IBM are examples of vendors with a range of solutions).
But, just what is data governance anyway? I found a valuable 2007 article in CIO. It says:
“Data is valuable. As the challenge of protecting customer data mounts, more and more businesses are embracing data-governance strategies to manage the information that serves as the lifeblood of the company. Without a doubt, data has become the raw material of the information economy, and data governance is a strategic imperative.
OK. I can accept the need to be concerned with the quality and use of corporate data.
Wikipedia has a page on the topic. It says that “Data governance encompasses the people, processes, and information technology required to create a consistent and proper handling of an organization’s data across the business enterprise.”
But, auditors and others have been talking about the integrity of data for ages: just think of good old input-output controls and concerns for data security. How is this different? Why do we talk about data governance separately from risk management and internal controls/security?
The Wikipedia piece also says:
“Data governance initiatives improve data quality by assigning a team responsible for data’s accuracy, accessibility, consistency, and completeness, among other metrics. This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs some form of methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data.”
This doesn’t seem to include how data is created and transformed within business applications like accounts payable and manufacturing – where the focus has to be on controls within those processes to ensure the completeness, accuracy, and validity of the transactions. Or does it? Maybe the intent is that data governance includes the controls within business processes!
Let me propose this and ask for your comments:
- Data created or transformed during business processes should be subject to controls and security within those business processes. The level of resources allocated should be appropriate to the level of risk if that data is not correct.
- Once the business processes have led to data being retained for analysis, reporting, etc., controls need to be in place to ensure it remains complete and reliable.
- If risks to data (e.g., theft, loss, corruption, lack of integrity, unauthorized access) are significant to the organization, they need to be addressed.
- If your organization relies on corporate data as a source for reporting (for example for financial and operational reporting, providing management with information used as a basis for decisions, or for other regulatory reporting), the risk of errors or omissions in that data might be significant.
Focusing on just one of these cannot be right.
So, however you define data governance, you need to address:
- How data is created and transformed during business processes
- How data is stored and protected, so that it retains its integrity
- How data is then used as a basis for analysis, business intelligence, decision-making, and reporting – which includes how data is transformed in that process (including aggregation of data from multiple sources)
Returning to the original “Data Governance in Silos? Bad Idea” article, I suggest that organizations should not only look at how they manage ‘data at rest’ in data warehouses and repositories – and avoid silos – but also consider the systems and processes where the data is created and transformed. Are those business processes and the related controls performed in silos? Are there opportunities to improve effectiveness and efficiency through standard approaches and tools?
I am interested in how you see this. Questions for you:
- How do you define data governance?
- Does it include controls and security within business processes like manufacturing and sales invoicing? Or does it only apply to ‘data at rest’?
- Who is responsible for it? Process owners or IT, or both?
- Do you have specialized tools to ensure the quality of data used in analysis and reporting?
- Do you agree with what I have laid out?