Even with the current concurrent compactions, given a high speed datafeed,
compactions will obviously start lagging at some stage, and once it does,
things can turn bad in terms of disk usage and read performance.

I have not read the compaction code well, but if
http://wiki.apache.org/cassandra/MemtableSSTable is up to date, I am
wondering if it:
1. Would it make sense to make full compactions occur a bit more aggressive.
That is, regardless of sstables of matching sizes, if the total number of
outstanding sstables gets above a certain datasize, would it make sense to
just schedule a full compaction rather than go through all the hoops of
gradually merging them in groups of matching sizes?

2. This is same topic, just another viewpoint. When you get to the stage
that compactionstats shows you something crazy like "pending tasks: 600", I
would think the code should be smart enough to either trigger a full
compaction and scrap the current queue, or at least merge some of those
pending tasks into larger ones instead of reading and writing the same data
again and again gradually merging it into larger and larger sizes?

The target behind both 1 and 2 would be to reduce the number of time data is
re-read and re-written in compactions before you reach the full dataset
size.

Reply via email to