What is data quality?
There are many ways of measuring data quality. But all measurements answer this one question: to what degree can you trust the data you are using for the purpose at hand?
Why does data quality matter?
Because problems with data quality can:
- Lead to inaccurate decisions or conclusions
- Increase costs (staff time, confusion, repetitive questions and issues)
- Create compliance or legal risk
In San Francisco, we have mixed feelings about our data quality. Every year we ask City analysts how they feel about data quality on a scale of 1-5 and we get an average response of 3. At the same time, data quality is often listed as a barrier to increased use.
When we have asked why, anecdotally, people list that they aren’t sure what data quality is and when is enough quality enough. So we designed a new guidebook, “How to Ensure Quality Data”, to start pinning down this amorphous thing called “Data Quality”.
Three steps to better data quality
In our guidebook, we lay out the following steps to better data quality:
- Collect Needs and Requirements. Before you define your data, you need to know why you are collecting it and for what purposes. You also need to identify your user needs and what requirements the data faces.
- Define the Dataset. Once you have your requirements, you can define the data tables and fields you need.
- Define Policies and Processes. You will need to define a set of policies and processes to manage your data through its lifecycle
Check out the guidebook and our companion worksheet. Send any feedback via our help desk, support.datasf.org.