Re: compaction throughput

Kai Wang Thu, 21 Jan 2016 06:13:32 -0800

I am using 2.2.4 and have seen multiple compactors running on the same
table. The number of compactors seems to be controlled by
concurrent_compactors. As of type of compactions, I've seen normal
compaction, tombstone compaction. Validation and Anticompaction seem to
always be single threaded.


On Thu, Jan 21, 2016 at 8:28 AM, PenguinWhispererThe . <
th3penguinwhispe...@gmail.com> wrote:

> Thanks for that clarification Sebastian! That's really good to know! I
> never took increasing this value in consideration because of my previous
> experience.
>
> In my case I had a table that was compacting over and over... and only one
> CPU was used. So that made me believe it was not multithreaded (I actually
> believe I asked this on IRC however it's been a few months ago so I might
> be wrong).
>
> Have there been behavioral changes on this lately? (I was using 2.0.9 or
> 2.0.11 I believe).
>
> 2016-01-21 14:15 GMT+01:00 Sebastian Estevez <
> sebastian.este...@datastax.com>:
>
>> >So compaction of one table will NOT spread over different cores.
>>
>> This is not exactly true. You actually can have multiple compactions
>> running at the same time on the same table, it just doesn't happen all that
>> often. You essentially would have to have two sets of sstables that are
>> both eligible for compactions at the same time.
>>
>> all the best,
>>
>> Sebastián
>> On Jan 21, 2016 7:41 AM, "PenguinWhispererThe ." <
>> th3penguinwhispe...@gmail.com> wrote:
>>
>>> After having some issues myself with compaction I think it's noteworthy
>>> to explicitly state that compaction of a table can only run on one CPU. So
>>> compaction of one table will NOT spread over different cores.
>>> To really have use of concurrent_compactors you need to have multiple
>>> table compactions initiated at the same time. If those are small they'll
>>> finish way earlier resulting in only one core using 100% as compaction is
>>> generally CPU bound (unless your disks can't keep up).
>>> I believe it's better to be CPU(core) bound on one core(or at least not
>>> all) for compaction than disk IO bound as this would result in writes and
>>> reads, ... having performance impact.
>>> Compaction is a maintenance task so it shouldn't be eating all your
>>> resources.
>>>
>>>
>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>>  This
>>> email has been sent from a virus-free computer protected by Avast.
>>> www.avast.com
>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>>> <#-2069969251_1162782367_-1582318301_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>> 2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com>:
>>>
>>>> Jeff & Sebastian,
>>>>
>>>> Thanks for the reply. There are 12 cores but in my case C* only uses
>>>> one core most of the time. *nodetool compactionstats* shows there's
>>>> only one compactor running. I can see C* process only uses one core. So I
>>>> guess I should've asked the question more clearly:
>>>>
>>>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>>>> 2. Is there any configuration that affects single core compaction
>>>> throughput?
>>>> 3. Is concurrent_compactors the only option to parallelize compaction?
>>>> If so, I guess it's the compaction strategy itself that decides when to
>>>> parallelize and when to block on one core. Then there's not much we can do
>>>> here.
>>>>
>>>> Thanks.
>>>>
>>>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com
>>>> > wrote:
>>>>
>>>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per
>>>>> core (depending on other load).  How many CPU cores do you have?
>>>>>
>>>>>
>>>>> From: Kai Wang
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Friday, January 15, 2016 at 12:53 PM
>>>>> To: "user@cassandra.apache.org"
>>>>> Subject: compaction throughput
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to figure out the bottleneck of compaction on my node. The
>>>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>>>> LCS. Here is my compaction related configs in cassandra.yaml:
>>>>>
>>>>> compaction_throughput_mb_per_sec: 160
>>>>> concurrent_compactors: 4
>>>>>
>>>>> I insert about 10G of data and start observing compaction.
>>>>>
>>>>> *nodetool compaction* shows most of time there is one compaction.
>>>>> Sometimes there are 3-4 (I suppose this is controlled by
>>>>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>>>>> At that point, disk IO is about 20-25 M/s write which is much lower than
>>>>> the disk is capable of. Even when there are 4 compactions running, I see
>>>>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>>>>> setcompactionthroughput 0* to disable the compaction throttle but
>>>>> don't see any difference.
>>>>>
>>>>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>>>>> there anyway to improve the throughput?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>

Re: compaction throughput

Reply via email to