I'm running a full compaction now and noticed this: Compacting large row … incrementally
… and the values were in the 300-500MB range. I'm storing NOTHING anywhere near that large. Max is about 200k... However, I'm storing my schema in a way so that I can do efficient time/range scans of the data and placing things into buckets. So my schema looks like: bucket, timestamp … and the partition key is bucket. Since this is a clustering row, does that mean that EVERYTHING is in one "row" under 'bucket' ? So even though my INSERTs are like 200k, they're all pooling under the same 'bucket' which is the partition key so cassandra is going to have a hard time compacting them. Part of the problem here is the serious abuse of vocabulary. The thrift/CQL impedance mismatch means that things have slightly different names and not-so-straigtforward nomenclature. So it makes it confusing as to what's actually happening under the hood. …. Then I saw: http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktik0g+epq4ctw28ty+dpexprtis...@mail.gmail.com%3E look for in_memory_compaction_limit_in_mb in cassandra.yaml … so this seems like it will be a problem and slow me down moving forward. Unless I figure out a workaround. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>