Re: compaction throughput

Peddi, Praveen Thu, 21 Jan 2016 05:13:57 -0800

That is interesting...
We recently resolved a performance issue solely by increasing 
concurrent_compactors parameter from default to 64. We have two tables but 90% 
data is only in 1 table. We got read performance boost of more than 100% just 
by increasing that param in yaml. Based on what you said, my observations look 
contradictory. Could you elaborate on how you came to that conclusion?



On Jan 21, 2016, at 7:42 AM, PenguinWhispererThe . 
<th3penguinwhispe...@gmail.com<mailto:th3penguinwhispe...@gmail.com>> wrote:

After having some issues myself with compaction I think it's noteworthy to 
explicitly state that compaction of a table can only run on one CPU. So 
compaction of one table will NOT spread over different cores.
To really have use of concurrent_compactors you need to have multiple table 
compactions initiated at the same time. If those are small they'll finish way 
earlier resulting in only one core using 100% as compaction is generally CPU 
bound (unless your disks can't keep up).
I believe it's better to be CPU(core) bound on one core(or at least not all) 
for compaction than disk IO bound as this would result in writes and reads, ... 
having performance impact.
Compaction is a maintenance task so it shouldn't be eating all your resources.

[https://ipmcdn.avast.com/images/logo-avast-v1.png]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
        This email has been sent from a virus-free computer protected by Avast.
www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>

2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com<mailto:dep...@gmail.com>>:
Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C* only uses one core 
most of the time. nodetool compactionstats shows there's only one compactor 
running. I can see C* process only uses one core. So I guess I should've asked 
the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
2. Is there any configuration that affects single core compaction throughput?
3. Is concurrent_compactors the only option to parallelize compaction? If so, I 
guess it's the compaction strategy itself that decides when to parallelize and 
when to block on one core. Then there's not much we can do here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa 
<jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>> wrote:
With SSDs, the typical recommendation is up to 0.8-1 compactor per core 
(depending on other load).  How many CPU cores do you have?


From: Kai Wang
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Subject: compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction on my node. The node is 
CentOS 7 and has SSDs installed. The table is configured to use LCS. Here is my 
compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

nodetool compaction shows most of time there is one compaction. Sometimes there 
are 3-4 (I suppose this is controlled by concurrent_compactors). During the 
compaction, I see one CPU core is 100%. At that point, disk IO is about 20-25 
M/s write which is much lower than the disk is capable of. Even when there are 
4 compactions running, I see CPU go to +400% but disk IO is still at 20-25M/s 
write. I use nodetool setcompactionthroughput 0 to disable the compaction 
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there 
anyway to improve the throughput?

Thanks.

Re: compaction throughput

Reply via email to