etc for long-term detailed storage of bandwidth statistics

David McKen Wed, 24 Mar 2010 04:51:12 -0700

Something you can look into is a feature called "Partitioning" which is
essentially where you set artificial boundaries and the database
maintains the separate data structures for you, saves you from having to
maintain the separate tables yourself. Both Postgres & MySQL (5.1)
support this for sure, most others should support it as well.


A not-so-technical explanation is available here:
http://dev.mysql.com/tech-resources/articles/performance-partitioning.html
<http://dev.mysql.com/tech-resources/articles/performance-partitioning.html>

If you are looking to store truly huge datasets (or need high
availability of the data) you may want to consider a cluster of
machines. That way you can spread the load around and maintain high
reliability & performance.

My 2 cents would be, databases are built to handle the loads you are
dealing with, on top of that they will automatically handle most of the
maintainence of that data for you (vs raw filesystem). Both MySQL and
PostgreSQL are great products and should easily scale to handle what you
need.

Good Luck

On 3/24/2010 3:50 AM, Ruben Laban wrote:
> On Tuesday 23 March 2010 at 17:10 (CET), Karl O. Pinc wrote:
>   
>> On 03/23/2010 10:25:04 AM, Ruben Laban wrote:
>>     
>>> On Tuesday 23 March 2010 at 16:10 (CET), Karl O. Pinc wrote:
>>>       
>>>> On 03/23/2010 06:00:13 AM, Ruben Laban wrote:
>>>>         
>>>>> Hello list,
>>>>>
>>>>> I'm trying to cook up an improved datamodel to store our
>>>>>           
>>> bandwidth
>>>
>>>       
>>>>> statics.
>>>>>           
>>>> Have you considered rrdtool, in conjunction with a supporting
>>>> tool for data display like cacti or whatever?  The rrdtool
>>>> website has links to related software.
>>>>
>>>> http://oss.oetiker.ch/rrdtool/
>>>>         
>>> I did/do indeed. I'm likely to build a two-tier setup:
>>> * A SQL based long-term storage solution with raw data.
>>>       
>> You might consider using postgresql for the backend, which I say
>> because I hear that it does well with _very_ large datasets.
>> Then you could have a real database and not have to artificially
>> divide things into separate tables.
>>
>> You may want to ask on, say, #postgresql on irc.freenode.net
>> just to get confirmation from people before doing a lot of work,
>>  or the mailing list or whatever.
>>     
> Hadn't thought about using postgresql, having only experience with MySQL and 
> a 
> bit of MSSQL. I'll definately look into it's support for large datasets.
>
>   
>>> * A (most likely) rrd based "frontend" storage solution which would
>>> allow easy
>>> graphing as well (which I currently use gnuplot for).
>>>
>>> I might need to get some SSD disks to be able to import the raw data
>>> into rrd
>>> files in a relatively fast way.
>>>
>>> Using rrdtool as single backend isn't an option. With the ammount of
>>> details
>>> we want to keep over time, the rrd files would be quite large. And
>>> since each
>>> rrd update pretty much requires the complete file to be "rewritten",
>>> the
>>> performance would be pretty bad.
>>>       
>> Actually not.  The beauty of rrdtool is that the files are of fixed
>> size and allocated all at once.  It then "rotates" through the
>> file as data comes in overwriting only the little part of the
>> file where the new data is stored.  It will perform much better
>> than a database.
>>     
> I know RRD files don't grow over time. Though from experience I can tell that 
> rrdtool is far from efficient when doing bulk updates (say adding months of 
> historical data). Then again, last I tried this was quite some time ago, so 
> rrdtool might have evolved since.
>
> Thanks for the pointers, I now have some more research to do.
>

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] Ideas/tips/hints/etc for long-term detailed storage of bandwidth statistics

Reply via email to