Re: Compaction process stuck

Jeff Jirsa Thu, 05 Jul 2018 22:46:27 -0700

You probably have a very large partition in that file. Nodetool cfstats
will show you the largest compacted partition now - I suspect it's much
higher than before.


On Thu, Jul 5, 2018 at 9:50 PM, atul atri <atulatri2...@gmail.com> wrote:

> Hi Chris,
>
> Compaction process finally finished. It took long time though.
>
> Thank you very much for all your help.
>
> Please let me know if you have any guidelines to make future compaction
> processes faster.
>
> Thanks & Regards,
> Atul Atri.
>
> On 5 July 2018 at 22:05, atul atri <atulatri2...@gmail.com> wrote:
>
>> Hi Cris,
>>
>> Thank you for reply.
>>
>> I already have tried to run "nodetool stop compaction" and this does not
>> help. I have restarted each node in cluster one by one and compaction
>> starts again. It gets stuck on same table.
>>
>> Following in 'nodetool compactionstats' output. It's stuck at
>> *1336035468* for more than 35  hours at least.
>>
>>
>>
>>
>> *pending tasks: 1          compaction type        keyspace
>> table       completed           total      unit  progress
>> Compactionnotification_system_v1user_notification      1336035468
>> 1660997721     bytes    80.44%Active compaction remaining time :   0h00m38s*
>>
>>
>> Following is output for "nodetool cfstats".
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Table: user_notification        SSTable count: 18        Space used
>> (live), bytes: 17247516201        Space used (total), bytes: 17316488652
>>     SSTable Compression Ratio: 0.41922805938461566        Number of keys
>> (estimate): 32556160        Memtable cell count: 44717        Memtable data
>> size, bytes: 27705294        Memtable switch count: 5        Local read
>> count: 0        Local read latency: 0.000 ms        Local write count:
>> 236961        Local write latency: 0.047 ms        Pending tasks: 0
>> Bloom filter false positives: 0        Bloom filter false ratio: 0.00000
>>     Bloom filter space used, bytes: 72414688        Compacted partition
>> minimum bytes: 104        Compacted partition maximum bytes: 4966933177
>>     Compacted partition mean bytes: 1183        Average live cells per
>> slice (last five minutes): 0.0        Average tombstones per slice (last
>> five minutes): 0.0*
>>
>> Please let me know if any more information. I am really thankful to you
>> for spending time on this investigation.
>>
>> Thanks & Regards,
>> Atul Atri.
>>
>>
>> On 5 July 2018 at 20:54, Chris Lohfink <clohf...@apple.com> wrote:
>>
>>> That looks a bit to me like it isnt stuck but just a long running
>>> compaction. Can you include the output of `nodetool compactionstats`
>>> and the `nodetool cfstats` with schema for the table thats being
>>> compacted (redacted names if necessary).
>>>
>>> Can stop compaction with `nodetool stop COMPACTION` or restarting the
>>> node.
>>>
>>> Chris
>>>
>>> On Jul 5, 2018, at 12:08 AM, atul atri <atulatri2...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We noticed that compaction process is also hanging on a node in backup
>>> ring. Please find attached thread dump for both servers. Recently, we have
>>> made few changes in cluster topology.
>>>
>>> a. Added new server in backup data-center and decommissioned old server.
>>> Backup ring only has 2 server.
>>> b. Added new node in primary data-center. Now it has 4 nods.
>>>
>>> Is there way we can stop this compaction? As we have added a new node in
>>> this cluster and we are waiting to run cleanup on this node on which
>>> compaction is hanging. I am afraid that cleanup will not start until
>>> compaction job finishes.
>>>
>>> Attachments:
>>> 1. cass-logg02.prod2.thread_dump.out: Thread dump from old node in
>>> primary datacenter
>>> 2. cass-logg03.prod1.thread_dump.out: Thread dump from new node in
>>> backup datacenter. This node is added recently.
>>>
>>> Your help is much appreciated.
>>>
>>> Thanks & Regards,
>>> Atul Atri.
>>>
>>>
>>> On 4 July 2018 at 21:15, atul atri <atulatri2...@gmail.com> wrote:
>>>
>>>> Hi Chris,
>>>> Thanks for reply.
>>>>
>>>> Unfortunately, our servers do not have jstack installed.
>>>> I tried "kill -3 <PID>" option but that is also not generating thread
>>>> dump.
>>>>
>>>> Is there any other way I can generate thread dump?
>>>>
>>>> Thanks & Regards,
>>>> Atul Atri.
>>>>
>>>> On 4 July 2018 at 20:32, Chris Lohfink <clohf...@apple.com> wrote:
>>>>
>>>>> Can you take a thread dump (jstack) and share the state of the
>>>>> compaction threads? Also check for “Exception” in logs
>>>>>
>>>>> Chris
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Jul 4, 2018, at 8:37 AM, atul atri <atulatri2...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On one of our server, compaction process is hanging. It's stuck at
>>>>> 80%. It was stuck for last 3 days. And today we did a cluster restart (one
>>>>> host at time). And again it is stuck at same 80%. CPU usages are 100% and
>>>>> there seems no IO issue. We are seeing following kinds of WARNING in
>>>>> system.log
>>>>>
>>>>> *BatchStatement.java (line 226) Batch of prepared statements for
>>>>> [****, *****] is of size 7557, exceeding specified threshold of 5120 by
>>>>> 2437.*
>>>>>
>>>>>
>>>>> Other than this there seems no error.  I have tried to stop compaction
>>>>> process, but it does not stop. Cassandra version is 2.1.
>>>>>
>>>>>  Can someone please guide us in solving this issue?
>>>>>
>>>>> Thanks & Regards,
>>>>> Atul Atri.
>>>>>
>>>>>
>>>>
>>> <cass-logg02.prod2.thread_dump.out><cass-logg03.prod1.thread_dump.out>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> <user-unsubscr...@cassandra.apache.org>
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> <user-h...@cassandra.apache.org>
>>>
>>>
>>>
>>
>

Re: Compaction process stuck

Reply via email to