Re: Compaction process stuck

Jordan West Fri, 06 Jul 2018 00:23:17 -0700

>From the output you shared, it looks like you have at least one *very*
large partition — my guess is its lots of very small rows per partition.
This would explain why the slow compaction and why "nodetool stop
compaction" didn’t work (prior to CASSANDRA-14397 stop only works between
partitions).


That large partition may cause problems elsewhere if its read.

Jordan

On Thu, Jul 5, 2018 at 9:35 AM, atul atri <atulatri2...@gmail.com> wrote:

> Hi Cris,
>
> Thank you for reply.
>
> I already have tried to run "nodetool stop compaction" and this does not
> help. I have restarted each node in cluster one by one and compaction
> starts again. It gets stuck on same table.
>
> Following in 'nodetool compactionstats' output. It's stuck at *1336035468*
> for more than 35  hours at least.
>
>
>
>
> *pending tasks: 1          compaction type        keyspace
> table       completed           total      unit  progress
> Compactionnotification_system_v1user_notification      1336035468
> 1660997721     bytes    80.44%Active compaction remaining time :   0h00m38s*
>
>
> Following is output for "nodetool cfstats".
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *Table: user_notification        SSTable count: 18        Space used
> (live), bytes: 17247516201        Space used (total), bytes: 17316488652
>     SSTable Compression Ratio: 0.41922805938461566        Number of keys
> (estimate): 32556160        Memtable cell count: 44717        Memtable data
> size, bytes: 27705294        Memtable switch count: 5        Local read
> count: 0        Local read latency: 0.000 ms        Local write count:
> 236961        Local write latency: 0.047 ms        Pending tasks: 0
> Bloom filter false positives: 0        Bloom filter false ratio: 0.00000
>     Bloom filter space used, bytes: 72414688        Compacted partition
> minimum bytes: 104        Compacted partition maximum bytes: 4966933177
>     Compacted partition mean bytes: 1183        Average live cells per
> slice (last five minutes): 0.0        Average tombstones per slice (last
> five minutes): 0.0*
>
> Please let me know if any more information. I am really thankful to you
> for spending time on this investigation.
>
> Thanks & Regards,
> Atul Atri.
>
>
> On 5 July 2018 at 20:54, Chris Lohfink <clohf...@apple.com> wrote:
>
>> That looks a bit to me like it isnt stuck but just a long running
>> compaction. Can you include the output of `nodetool compactionstats` and
>> the `nodetool cfstats` with schema for the table thats being compacted
>> (redacted names if necessary).
>>
>> Can stop compaction with `nodetool stop COMPACTION` or restarting the
>> node.
>>
>> Chris
>>
>> On Jul 5, 2018, at 12:08 AM, atul atri <atulatri2...@gmail.com> wrote:
>>
>> Hi,
>>
>> We noticed that compaction process is also hanging on a node in backup
>> ring. Please find attached thread dump for both servers. Recently, we have
>> made few changes in cluster topology.
>>
>> a. Added new server in backup data-center and decommissioned old server.
>> Backup ring only has 2 server.
>> b. Added new node in primary data-center. Now it has 4 nods.
>>
>> Is there way we can stop this compaction? As we have added a new node in
>> this cluster and we are waiting to run cleanup on this node on which
>> compaction is hanging. I am afraid that cleanup will not start until
>> compaction job finishes.
>>
>> Attachments:
>> 1. cass-logg02.prod2.thread_dump.out: Thread dump from old node in
>> primary datacenter
>> 2. cass-logg03.prod1.thread_dump.out: Thread dump from new node in
>> backup datacenter. This node is added recently.
>>
>> Your help is much appreciated.
>>
>> Thanks & Regards,
>> Atul Atri.
>>
>>
>> On 4 July 2018 at 21:15, atul atri <atulatri2...@gmail.com> wrote:
>>
>>> Hi Chris,
>>> Thanks for reply.
>>>
>>> Unfortunately, our servers do not have jstack installed.
>>> I tried "kill -3 <PID>" option but that is also not generating thread
>>> dump.
>>>
>>> Is there any other way I can generate thread dump?
>>>
>>> Thanks & Regards,
>>> Atul Atri.
>>>
>>> On 4 July 2018 at 20:32, Chris Lohfink <clohf...@apple.com> wrote:
>>>
>>>> Can you take a thread dump (jstack) and share the state of the
>>>> compaction threads? Also check for “Exception” in logs
>>>>
>>>> Chris
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jul 4, 2018, at 8:37 AM, atul atri <atulatri2...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On one of our server, compaction process is hanging. It's stuck at 80%.
>>>> It was stuck for last 3 days. And today we did a cluster restart (one host
>>>> at time). And again it is stuck at same 80%. CPU usages are 100% and there
>>>> seems no IO issue. We are seeing following kinds of WARNING in system.log
>>>>
>>>> *BatchStatement.java (line 226) Batch of prepared statements for [****,
>>>> *****] is of size 7557, exceeding specified threshold of 5120 by 2437.*
>>>>
>>>>
>>>> Other than this there seems no error.  I have tried to stop compaction
>>>> process, but it does not stop. Cassandra version is 2.1.
>>>>
>>>>  Can someone please guide us in solving this issue?
>>>>
>>>> Thanks & Regards,
>>>> Atul Atri.
>>>>
>>>>
>>>
>> <cass-logg02.prod2.thread_dump.out><cass-logg03.prod1.thread_dump.out>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> <user-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> <user-h...@cassandra.apache.org>
>>
>>
>>
>

Re: Compaction process stuck

Reply via email to