May 29, 2018 inJulien Bovet.
byIn the previous blog post, we saw how to identify missing dates in a source. Let’s talk about a worst kind of dates, those who are in the dataset, but with an extremely low number of records.
Chances are these dates were inserted with a faulty batch run and you only have a percentage of what you should. How can we automatically detect them? Simply put, we need to check the number of records for each date, and decide whether it seems high enough, or too low. One could use a simple constant to determine incomplete dates:
Days with less than ten records are suspicious!
However, this approach will fail to capture suspicious dates in large datasets, and datasets subject to seasonal effects. Instead, why don’t we take a look at moving averages?
If you’re interested in the maths behind moving averages, Wikipedia is your friend. In Tableau, moving averages can be computed using table calculations quite easily. For this tutorial, I built a fake dataset, but I encourage you, again, to directly use your own data.
Date
field in the columns shelf.Number of Records
in the rows shelf.Date
pill to “Exact Date”.Number of Records
in the rows shelf once more.Number of Records
pill, and click on “Quick Table Calculation > Moving Average”.Moving Average
. Life is not always full of surprises.Moving Average
.SUM(Number of Records)
mark type to “Area”To fine tune the number of days we want in the moving average, and the threshold against which dates are considered incomplete, let’s use parameters.
Number of days
with an integer data type, and a minimum value of 1.Threshold
with a float data type, and a range between 0 and 1.Moving Average
variable in the data panel, replace the previous value we chose with your Number of days
parameter, and multiply everything with our Threshold
parameter. If you’re lazy, copy-paste the formula below:[Threshold]*((ZN(WINDOW_SUM(SUM([Number of Records]), -[Number of days], -1)) +
ZN(WINDOW_SUM(SUM([Number of Records]), 1, [Number of days]))) /
(ZN(WINDOW_COUNT(SUM([Number of Records]), -[Number of days], -1)) +
ZN(WINDOW_COUNT(SUM([Number of Records]), 1, [Number of days]))))
Now that we defined our reference with the moving average, let’s put a tick on suspicious dates.
Moving Average
in the data panel and select “Create > Calculated Field…”, name it Suspicious Dates
Moving Average
:IF SUM([Number of Records]) < [Moving Average] THEN SUM([Number of Records]) ELSE NULL END
Moving Average
pill on the rows shelf with your new calculation.Suspicious Dates
to “Shape”.Alright, this time I’ll help you a bit. Remember, it’s always better to duplicate a sheet and use it as a starting point.
SUM(Number of Records)
pill out of the rows shelf.Date
pill in the columns shelf to discrete.Suspicious Dates
pill in the rows shelf to the Marks panel and drop it on ‘Text’.Suspicious Dates
is still computed using Date
.Suspicious Dates
and drop it on the filters shelf.I often find myself struggling with these simple lists. The key is to remember that:
Suspicious Dates
is a table calculation that needs every date to work properly.To wrap up this post and the previous one, here’s what you could end up with using the techniques we learned: