To be honest, this started after feeding data to cassandra for a while with compaction disabled (sort of a test case).
when I enabled it... boom... spectacular process with 2000% CPU usage (please note... there is compression in cassandra in this system). This system actually have SSD's so when throttled a bit, the I/O is really not a problem, but I doubt that a HDD based system would have managed to keep up. I agree, this is hopefully something that does not normally happen, but then again, some protection against Murphy's law is always good. Thanks! Terje On Tue, Apr 26, 2011 at 4:35 PM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Tue, Apr 26, 2011 at 9:01 AM, Terje Marthinussen > <tmarthinus...@gmail.com> wrote: > > Hi, > > I was testing the multithreaded compactions and with 2x6 cores (24 with > HT) > > it does seem a bit crazy with 24 compactions running concurrently. > > It is probably not very good in terms of random I/O. > > It does seems a bit overkill. However, I'm slightly curious how you > ended up with 24 parallel > compactions, more precisely, how did you end up with enough sstables > to trigger 24 > compactions ? Was that done on purpose for testing sake, or did you > saw that in a real > situation ? > > I'm asking because in 'real' situation, given that compaction are > triggered only if there is > some number of files to compact, and provided the cluster is correctly > provisioned, I wouldn't > expect the number of parallel compaction to jump to such numbers (one > of the goal of > multi_treaded compaction was to make sure we never end up accumulating > lots of un-compacted > sstables). Anyway, I get your point, just wondering if that was a real > situation. > > > As such, I think I agree with the argument in 2191 that there should be a > > config option for this. > > Probably a default that is dynamic with 1 thread per column family +2 or > 3 > > thread for parallel compactions outside of that could be good. > > Any other opinions? > > Multi-threaded compaction is optional and compaction throttling is > supposed to mitigage > it. However I do agree that too much many compactions may be a bad use > of resources > because of random IO even if correctly throttled. I think it's missing > a configuration option > "concurrent_compactions" like there is a "concurrent_writes|reads". > For that, I have created > https://issues.apache.org/jira/browse/CASSANDRA-2558 > > > I guess the compaction thread pool should also show up in tpstats? > > Yes it should ... and it will ... eventually :) > > Thanks for the feedback. > > -- > Sylvain >