Re: Cassandra compaction stuck? Should I disable?

PenguinWhispererThe . Tue, 10 Nov 2015 00:31:20 -0800

Hi Rob,

Thanks for your reply.


2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>:

> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
> th3penguinwhispe...@gmail.com> wrote:
>>
>> In Opscenter I see one of the nodes is orange. It seems like it's working
>> on compaction. I used nodetool compactionstats and whenever I did this the
>> Completed nad percentage stays the same (even with hours in between).
>>
> Are you the same person from IRC, or a second report today of compaction
> hanging in this way?
>
Same person ;) Just didn't had things to work with from the chat there. I
want to understand the issue more, see what I can tune or fix. I want to do
nodetool repair before upgrading to 2.1.11 but the compaction is blocking
it.

>
>
>
What version of Cassandra?
>
2.0.9

> I currently don't see cpu load from cassandra on that node. So it seems
>> stuck (somewhere mid 60%). Also some other nodes have compaction on the
>> same columnfamily. I don't see any progress.
>>
>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions 
>> for usage_record_ptd.  Probably there is an unusually large row in progress 
>> somewhere.  It is also possible that buggy code left some sstables 
>> compacting after it was done with them
>>
>>
>>    - How can I assure that nothing is happening?
>>
>> Find the thread that is doing compaction and strace it. Generally it is
> one of the threads with a lower thread priority.
>

I have 141 threads. Not sure if that's normal.

This seems to be the one:
 61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java

In the strace I see basically this part repeating (with once in a while the
"resource temporarily unavailable"):
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
getpriority(PRIO_PROCESS, 61404)        = 16
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
getpriority(PRIO_PROCESS, 61404)        = 16
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
getpriority(PRIO_PROCESS, 61404)        = 16

But wait!
I also see this:
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---

This doesn't seem to happen that often though.

>
> Compaction often appears hung when decompressing a very large row, but
> usually not for "hours".
>
>>
>>    - Is it recommended to disable compaction from a certain data size?
>>    (I believe 25GB on each node).
>>
>> It is almost never recommended to disable compaction.
>
>>
>>    - Can I stop this compaction? nodetool stop compaction doesn't seem
>>    to work.
>>
>> Killing the JVM ("the dungeon collapses!") would certainly stop it, but
> it'd likely just start again when you restart the node.
>
>>
>>    - Is stopping the compaction dangerous?
>>
>>  Not if you're in a version that properly cleans up partial compactions,
> which is most of them.
>
>>
>>    - Is killing the cassandra process dangerous while compacting(I did
>>    nodetool drain on one node)?
>>
>> No. But probably nodetool drain couldn't actually stop the in-progress
> compaction either, FWIW.
>
>> This is output of nodetool compactionstats grepped for the keyspace that
>> seems stuck.
>>
>> Do you have gigantic rows in that keyspace? What does cfstats say about
> the largest row compaction has seen/do you have log messages about
> compacting large rows?
>

I don't know about the gigantic rows. How can I check?

I've checked the logs and found this:
 INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
CompactionController.java (line 192) Compacting large row
billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
incrementally
So this is from 6 hours ago.

I also see a lot of messages like this:
INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java (line
58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
And (although it's unrelated this might impact compaction performance?):
 WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
BatchStatement.java (line 223) Batch of prepared statements for
[billing.usage_record_ptd] is of size 13834, exceeding specified threshold
of 5120 by 8714.

It's like the compaction is only doing one sstable at a time and is doing
nothing a long time in between.

cfstats for this keyspace and columnfamily gives the following:
                Table: mycolumnfamily
                SSTable count: 26
                Space used (live), bytes: 319858991
                Space used (total), bytes: 319860267
                SSTable Compression Ratio: 0.24265700071674673
                Number of keys (estimate): 6656
                Memtable cell count: 22710
                Memtable data size, bytes: 3310654
                Memtable switch count: 31
                Local read count: 0
                Local read latency: 0.000 ms
                Local write count: 997667
                Local write latency: 0.000 ms
                Pending tasks: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used, bytes: 12760
                Compacted partition minimum bytes: 1332
                Compacted partition maximum bytes: 43388628
                Compacted partition mean bytes: 234682
                Average live cells per slice (last five minutes): 0.0
                Average tombstones per slice (last five minutes): 0.0


> I also see frequently lines like this in system.log:
>>
>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>> BatchStatement.java (line 223) Batch of prepared statements for 
>> [billing.usage_record_by_billing_period, billing.metric] is of size 53086, 
>> exceeding specified threshold of 5120 by 47966.
>>
>>
> Unrelated.
>
> =Rob
>
>

Can I upgrade to 2.1.11 without doing a nodetool repair/compaction being
stuck?
Another thing to mention is that nodetool repair didn't run yet. It got
installed but nobody bothered to schedule the repair.

Thanks for looking into this!

Re: Cassandra compaction stuck? Should I disable?

Reply via email to