Re: [rrd-users] large dataset considerations

Simon Hobson Thu, 24 Oct 2013 04:13:09 -0700

S Ahmed wrote:
> Is this tool used by any large scale usages?

> What is considered a large database size?


> Scenerio: Say you want to store time series time informaiton in a Saas 
> application.
> I'm guessing there is some sort of threshold where it makes sense to 
> partition your data e.g. by
> a group of customers in order to scale out the usage

I'm not clear on what the question is, and I suspect you haven't really thought 
about how RRD stores data.

Typically, you'd create an RRD file for a distinct set of related data that can 
be updated together - but if you have data that is not related or varies in 
number of items, then you'd put that data in a number of separate RRD files.

Example, monitoring systems.
Within a system, data such as CPU load, number of processes, RAM in use/free 
and so on are a distinct set - so you might put those in one RRD file. For data 
such as network I/O, you can put sent and received data into one RRD file, but 
you'd create a separate RRD file for each interface since the number of 
interfaces is variable - ditto things like disk space & I/O where the number of 
disks varies so you'd create one RRD file per filesystem, and one RRD file per 
physical disk.

Extending that, if you were monitoring multiple systems, the number of systems 
is typically variable - so you'd create a set of RRD files for each system. If 
you want combined stats, tehn one option is to do as I've just done for mail 
queue information : collect from each server separately (in my case using cache 
daemon to put all the file sin one place), and have a separate program that 
periodically queries the set of files, combines the data, and updates a 
separate RRD file.

Now when it comes to scalability, RRD doesn't really impose any great limits 
itself. There is no central daemon managing things (unless you choose to use 
the cache daemon but that's not really the same thing) - just different 
programs that independently update RRD files, and read RRD files to generate 
information for users.
So it comes down to : have you the disk space to store the data you want to 
store, do you have the CPU capacity to run the collection programs and output 
programs, do you have the disk I/O to handle it. Unless you have a very large 
dataset that doesn't break down into logical chunks, then you have the option 
of storing and processing the data on multiple systems if you need to split the 
storage and/or processing.

If that doesn't answer the question, then perhaps you could be a bit more 
specific about what the question is.

_______________________________________________
rrd-users mailing list
rrd-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users

Re: [rrd-users] large dataset considerations

Reply via email to