Even with the current concurrent compactions, given a high speed datafeed, compactions will obviously start lagging at some stage, and once it does, things can turn bad in terms of disk usage and read performance.
I have not read the compaction code well, but if http://wiki.apache.org/cassandra/MemtableSSTable is up to date, I am wondering if it: 1. Would it make sense to make full compactions occur a bit more aggressive. That is, regardless of sstables of matching sizes, if the total number of outstanding sstables gets above a certain datasize, would it make sense to just schedule a full compaction rather than go through all the hoops of gradually merging them in groups of matching sizes? 2. This is same topic, just another viewpoint. When you get to the stage that compactionstats shows you something crazy like "pending tasks: 600", I would think the code should be smart enough to either trigger a full compaction and scrap the current queue, or at least merge some of those pending tasks into larger ones instead of reading and writing the same data again and again gradually merging it into larger and larger sizes? The target behind both 1 and 2 would be to reduce the number of time data is re-read and re-written in compactions before you reach the full dataset size.