Concurrent major compaction

Frederick Ryckbosch Thu, 10 May 2012 09:27:50 -0700

Hi,

We have a single-node cassandra that contains volatile data: every day about 2 
Gb of data is written, this data is kept for 7 days and then removed (using 
TTL). To avoid that the application becomes slow during a large compaction, we 
do a major compaction every night (less users, less performance impact).


The major compaction is CPU bound: it uses about 1 core and only consumes 4 
Mb/sec disk IO. We would like to scale the compaction with the resources 
available in the machine (cores, disks). Enabling multithreaded_compaction 
didn't help a lot, the CPU usage goes up to 120% of one core, but does not 
scale with the number of cores.

To make the compaction scale with the number of cores in our machine, we tried 
to perform a major compaction on multiple column families (in the same 
keyspace) at the same time using `nodetool -h localhost compact testSpace data1 
data2`, however the 2 compactions are executed serially in stead of 
concurrently, with concurrent_compactors set to 4 (the number of cores).

Is this normal behavior (both the multihreading and concurrent compactions) ? 
Is there any way to make the major compactions scale with the number of cores 
in the machine ?

Thanks !
Frederick

Concurrent major compaction

Reply via email to