[rrd-users] NaN values in percentile calculations

Jacques du Toit Thu, 25 May 2017 05:15:56 -0700

Hi guys

I'm having this issue with some of our data for a customer’s billing. Wondering 
if there is an elegant solution? What I really want is the ability to exclude 
NaN values from percentile calculations. Consider the following data series:


foo = 1,NaN,2,NaN,3,NaN,4,NaN,5,NaN,6,NaN,7,NaN,8,NaN,9,NaN,10,NaN

VDEF:90perc=foo,90,PERCENT

Seems like rrdtool includes the NaN values in the calculation. So I get 
90perc=8. While technically correct according to the series, its not really 
useful in determining real 90th percentile values in an every-day use-case. 
Particularly with billing, just because you have no data it’s not reasonable to 
assume it’s 0. Most likely the traffic profile by “connecting-the-dots” would 
have actually looked like this:

1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10

In this case I get 90perc=9 which means the billing would be skewed because of 
the loss of data. Now, it’s debatable what is mathematically really the right 
thing to do here. But the bottom line is that that we don’t know if the NaN 
values would have been above or below the 90th percentile value, it’s better to 
exclude them rather than assume they are below, IMHO.

The customer would also not be too happy as NaN values being included always 
pushes the percentile value down by definition and this mean they might end up 
with slightly “incorrect” billing.

So anyone know of a way to exclude those NaN values from the PERCENT 
calculation?

Thanks,
  Jacques

_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

[rrd-users] NaN values in percentile calculations

Reply via email to