Thanks Dan, good info. > First off, what version of Cassandra are you using?
Sorry my bad, 0.8.4 > Provided you are using a recent Cassandra version (late 0.7 or 0.8.x) I doubt > the commit log is your problem. My experience using Cassandra as a time > series data store (with a full 30 days of data + various aggregations) has > been that the commit log is a trivial fraction of the actual data. That said, > its highly dependent on how you use your data and when it expires/gets > deleted (with considerations for gc_grace). We keep 5 minute data on a few thousand "objects" for 13 months. We also do "rollup" aggregation for generating longer time period graphs and reports, very RRD like. With a few months of data, I see 86GB in commitlog and 42GB in data… but then again this is while I'm still in data as fast as I can for a test case, so that may have something to do with it :) > > As one final point, as of 0.8, I would not recommend playing with per-CF > flush settings. There are global thresholds which work far better and account > for things like java overhead. > Out of curiosity, why do global flush thresholds work better than per-CF settings? My first thought is that I would want finer grained controls as my CFs can be extremely different in write/read patterns. Thanks, -Derek
smime.p7s
Description: S/MIME cryptographic signature