For the multithreaded compaction, https://issues.apache.org/jira/browse/CASSANDRA-4182 is relevant. Basically, because you do a major compaction every night, you do are in the case of '1 large sstables and a bunch of others', for which the design of multithreaded compaction won't help too much.
For the concurrent part, this is due to the fact that major compaction grabs a global lock before running. We could (will) change that to be one lock per-CF (https://issues.apache.org/jira/browse/CASSANDRA-3430) but it's not done yet. If you feel adventurous and care enough, you can always try to apply the patch on CASSANDRA-3430, it should be fine if you don't use truncate. -- Sylvain On Thu, May 10, 2012 at 6:27 PM, Frederick Ryckbosch <frederick.ryckbo...@gmail.com> wrote: > Hi, > > We have a single-node cassandra that contains volatile data: every day about > 2 Gb of data is written, this data is kept for 7 days and then removed (using > TTL). To avoid that the application becomes slow during a large compaction, > we do a major compaction every night (less users, less performance impact). > > The major compaction is CPU bound: it uses about 1 core and only consumes 4 > Mb/sec disk IO. We would like to scale the compaction with the resources > available in the machine (cores, disks). Enabling multithreaded_compaction > didn't help a lot, the CPU usage goes up to 120% of one core, but does not > scale with the number of cores. > > To make the compaction scale with the number of cores in our machine, we > tried to perform a major compaction on multiple column families (in the same > keyspace) at the same time using `nodetool -h localhost compact testSpace > data1 data2`, however the 2 compactions are executed serially in stead of > concurrently, with concurrent_compactors set to 4 (the number of cores). > > Is this normal behavior (both the multihreading and concurrent compactions) ? > Is there any way to make the major compactions scale with the number of cores > in the machine ? > > Thanks ! > Frederick >