Hi Mikel,
I've personally never found a good reason to store more than one datasource per
RRD datafile; and run -very large- rrdtool data servers ( multi-millions per
server - many servers .)
There are far too many edge-cases, latency issues and join overhead in trying
to consolidate datasources into a single datafile. Yes, rrdtool itself is more
efficient with an insert like that but: 1) what if the datapoints are collected
at different times? 2) what if they are different steps? 3) what if you want
to add a datasource? 4) what if you simply have too many datasources to try and
order/consolidate from a queue to the datafile? There is also non-trivial
complexity, overhead and index'ing into an rrd datafile for specific
datasources.
Linux is extremely efficient at block updates, caching, open/closes, etc ...
rrdtool on a low-end ( 4 cpu ) server with limited memory can easily store 160
thousand datasources per minute - on a better server, a whole lot more than
that.
'Distributed Cluster' isn't a good reason to not send all your time-series data
to one server or small set of servers. The latency/request-time incurred in
having to fetch data from those servers is usually not worth the trade off.
Graphs of many hundreds of datasources computed for multi-day/week time-ranges
in the result set are generated in 10s of milliseconds; not seconds ... rrdtool
is quite capable of producing on-demand graphs of hundreds of graphs per second
from one server.
I suggest you write a little test-script to write out rrd data to individual
rrd datafiles to see 'how quick' your servers are at it. There is some OS
tuning and rrdtool RRA sizing that will help; especially don't keep hour or
daily rollups ... the server has to hold onto those blocks to make the
consolidation quick and not incur a read from disk.
rrdtool scales rather simply ( and without rrdcached -- as I don't use that
either. )
HTH
-Ryan
________________________________
From: mikel <infoeusk...@gmail.com>
To: rrd-users@lists.oetiker.ch
Sent: Saturday, April 20, 2013 4:48 AM
Subject: Re: [rrd-users] [unsure] max DS per rrd file
Thanks for your fast reply again.
>Maybe I don't understand what you say here. Some metrics, or all metrics
are
>queried? Both statements cannot be true at the same time?
Yes it is a tricky case. Apologies I was not clear enough.
In most cases all metrics are queried at the same time, because we want to
know what value they had at a given time. And classify them.
Very randomly we would query for just one metric.
>Anyway, if you query only once in a while, maybe you should think about
>reducing the number of RRAs in each RRD, and just let it consolidate at
>graph time. Yes, this will mean you will have to wait longer for your graph
>to be made, but you save processing time at every update.
This is interesting I did not think about that. Thanks for the hint.
Thanks for your help again.
m
--
View this message in context:
http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.
_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users