Thanks Dan, good info.

> First off, what version of Cassandra are you using?

Sorry my bad, 0.8.4

> Provided you are using a recent Cassandra version (late 0.7 or 0.8.x) I doubt 
> the commit log is your problem. My experience using Cassandra as a time 
> series data store (with a full 30 days of data + various aggregations) has 
> been that the commit log is a trivial fraction of the actual data. That said, 
> its highly dependent on how you use your data and when it expires/gets 
> deleted (with considerations for gc_grace).

We keep 5 minute data on a few thousand "objects" for 13 months.  We also do 
"rollup" aggregation for generating longer time period graphs and reports, very 
RRD like.  With a few months of data, I see 86GB in commitlog and 42GB in data… 
but then again this is while I'm still in data as fast as I can for a test 
case, so that may have something to do with it :)

> 
> As one final point, as of 0.8, I would not recommend playing with per-CF 
> flush settings. There are global thresholds which work far better and account 
> for things like java overhead. 
> 

Out of curiosity, why do global flush thresholds work better than per-CF 
settings?  My first thought is that I would want finer grained controls as my 
CFs can be extremely different in write/read patterns.

Thanks,
-Derek

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to