Re: [rrd-users] [unsure] max DS per rrd file

Ryan Kubica Wed, 24 Apr 2013 18:20:17 -0700


Hi Mikel,


I've personally never found a good reason to store more than one datasource per 
RRD datafile; and run -very large- rrdtool data servers ( multi-millions per 
server - many servers .)

There are far too many edge-cases, latency issues and join overhead in trying 
to consolidate datasources into a single datafile.  Yes, rrdtool itself is more 
efficient with an insert like that but: 1) what if the datapoints are collected 
at different times?  2) what if they are different steps?  3) what if you want 
to add a datasource? 4) what if you simply have too many datasources to try and 
order/consolidate from a queue to the datafile?  There is also non-trivial 
complexity, overhead and index'ing into an rrd datafile for specific 
datasources. 

Linux is extremely efficient at block updates, caching, open/closes, etc ... 
rrdtool on a low-end ( 4 cpu ) server with limited memory can easily store 160 
thousand datasources per minute - on a better server, a whole lot more than 
that.

'Distributed Cluster' isn't a good reason to not send all your time-series data 
to one server or small set of servers.  The latency/request-time incurred in 
having to fetch data from those servers is usually not worth the trade off.

Graphs of many hundreds of datasources computed for multi-day/week time-ranges 
in the result set are generated in 10s of milliseconds; not seconds ... rrdtool 
is quite capable of producing on-demand graphs of hundreds of graphs per second 
from one server.

I suggest you write a little test-script to write out rrd data to individual 
rrd datafiles to see 'how quick' your servers are at it.  There is some OS 
tuning and rrdtool RRA sizing that will help; especially don't keep hour or 
daily rollups ... the server has to hold onto those blocks to make the 
consolidation quick and not incur a read from disk.

rrdtool scales rather simply ( and without rrdcached -- as I don't use that 
either. )

HTH
-Ryan


________________________________
 From: mikel <infoeusk...@gmail.com>
To: rrd-users@lists.oetiker.ch 
Sent: Saturday, April 20, 2013 4:48 AM
Subject: Re: [rrd-users] [unsure]  max DS per rrd file
 


Thanks for your fast reply again.

>Maybe I don't understand what you say here. Some metrics, or all metrics
are 
>queried? Both statements cannot be true at the same time?

Yes it is a tricky case. Apologies I was not clear enough.

In most cases all metrics are queried at the same time, because we want to
know what value they had at a given time. And classify them.

Very randomly we would query for just one metric.

>Anyway, if you query only once in a while, maybe you should think about 
>reducing the number of RRAs in each RRD, and just let it consolidate at 
>graph time. Yes, this will mean you will have to wait longer for your graph 
>to be made, but you save processing time at every update.

This is interesting I did not think about that. Thanks for the hint.

Thanks for your help again.
m



--
View this message in context: 
http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.

_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Re: [rrd-users] [unsure] max DS per rrd file

Reply via email to