Hi guys
I'm having this issue with some of our data for a customer’s billing. Wondering
if there is an elegant solution? What I really want is the ability to exclude
NaN values from percentile calculations. Consider the following data series:
foo = 1,NaN,2,NaN,3,NaN,4,NaN,5,NaN,6,NaN,7,NaN,8,NaN,9,NaN,10,NaN
VDEF:90perc=foo,90,PERCENT
Seems like rrdtool includes the NaN values in the calculation. So I get
90perc=8. While technically correct according to the series, its not really
useful in determining real 90th percentile values in an every-day use-case.
Particularly with billing, just because you have no data it’s not reasonable to
assume it’s 0. Most likely the traffic profile by “connecting-the-dots” would
have actually looked like this:
1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10
In this case I get 90perc=9 which means the billing would be skewed because of
the loss of data. Now, it’s debatable what is mathematically really the right
thing to do here. But the bottom line is that that we don’t know if the NaN
values would have been above or below the 90th percentile value, it’s better to
exclude them rather than assume they are below, IMHO.
The customer would also not be too happy as NaN values being included always
pushes the percentile value down by definition and this mean they might end up
with slightly “incorrect” billing.
So anyone know of a way to exclude those NaN values from the PERCENT
calculation?
Thanks,
Jacques
_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users