May 24, 2018 inJulien Bovet.
byIn a perfect world, companies use dedicated tools and hire people responsible for data quality.
But let’s be real: I’m fairly certain we’re all facing data quality issues with our sources. You’ll find in this series of blog posts a few inputs on how you can monitor data quality directly in Tableau. The objective is not to patch or correct your data with Tableau tricks, since it will only make things worse in the long run, but to design a monitoring dashboard that will tell you at a glance what is missing or what needs to be fixed.
After some googling, it appears that a handful of dimensions are used to define data quality, such as completeness, uniqueness, timeliness, consistency, conformity, integrity, and so on. Instead of copy-pasting definitions here, let’s deep dive into concrete stuff immediately. If you truly want to read more about the concept of data quality and its dimensions, just let me google that for you!
Here’s the deal: data sources are often updated using batch processing, which means that each morning, yesterday’s data is inserted. When this process is not perfectly orchestrated or monitored using wonderful tools such as Airflow, you might end up with missing dates for various reasons. Moreover, when you deal with huge databases over multiple years, one or two missing days can easily go unnoticed.
Let’s start this series by making a simple dashboard that shows how many records you have per day and puts into light those missing fuckers. For this tutorial, you can use my fake dataset, but I’d advise you to directly work with your own data.
First things first, let’s create a simple area chart showing how many records we have per day.
Date
dimension into the columns shelf.Number of Records
into the rows shelf.Date
pill and select “Exact Date”.Ok, so at this point, you should already see how many records you have each day. However, Tableau does not create marks for missing dates, which is fairly normal since they do not exist in the dataset. We’re going to add a little magic to get what we want.
LOOKUP(SUM([Number of Records]),0)
Date
pill, and check “Show Missing Values”Now, with the LOOKUP
table calculation, Tableau looks at what is displayed in the viz. Combine this with “Show Missing Values”, and Tableau will test the table calculation for each dates, including the missing ones. However, Tableau still doesn’t know what to do when LOOKUP
returns NULL for our missing dates: “Show at Default Value” is our final trick.
After some more testing, it appears that these additionnal steps are not necessary for area charts, but they are for other types (don’t ask why!). Also, you could avoid the seventh step by wrapping the
LOOKUP
formula withZN()
, which forces 0 when NULL.
The previous chart is a nice start: you can clearly see how the number of records changes on a day to day basis, and whether or not some days are at 0. However, when you’re dealing with two years or more of data, and a huge number of records per day, finding those thin drops to zero will be extremely difficult. Let’s make a simpler chart, showing whether dates have records or not.
Number of Records (incl. 0)
.Missing?
with the following formula:IF [Number of Records (incl. 0)] > 0 THEN "Present" ELSE "Missing" END
Missing?
in the Rows shelf and on color.You can, and you should! In fact, you could also add a few text elements, such as the overall coverage, or the exact number of missing days. There’s nothing incredibly hard to get these sheets up and running: I won’t do a step-by-step and leave you the fun to figure it out! Remember to start by duplicating a sheet though ;).
Here’s a final dashboard to see a potential end result: