Hi Rob,

Thanks for your reply.

2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>:

> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
> th3penguinwhispe...@gmail.com> wrote:
>>
>> In Opscenter I see one of the nodes is orange. It seems like it's working
>> on compaction. I used nodetool compactionstats and whenever I did this the
>> Completed nad percentage stays the same (even with hours in between).
>>
> Are you the same person from IRC, or a second report today of compaction
> hanging in this way?
>
Same person ;) Just didn't had things to work with from the chat there. I
want to understand the issue more, see what I can tune or fix. I want to do
nodetool repair before upgrading to 2.1.11 but the compaction is blocking
it.

>
>
>
What version of Cassandra?
>
2.0.9

> I currently don't see cpu load from cassandra on that node. So it seems
>> stuck (somewhere mid 60%). Also some other nodes have compaction on the
>> same columnfamily. I don't see any progress.
>>
>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions 
>> for usage_record_ptd.  Probably there is an unusually large row in progress 
>> somewhere.  It is also possible that buggy code left some sstables 
>> compacting after it was done with them
>>
>>
>>    - How can I assure that nothing is happening?
>>
>> Find the thread that is doing compaction and strace it. Generally it is
> one of the threads with a lower thread priority.
>

I have 141 threads. Not sure if that's normal.

This seems to be the one:
 61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java

In the strace I see basically this part repeating (with once in a while the
"resource temporarily unavailable"):
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
getpriority(PRIO_PROCESS, 61404)        = 16
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
getpriority(PRIO_PROCESS, 61404)        = 16
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
getpriority(PRIO_PROCESS, 61404)        = 16

But wait!
I also see this:
futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---

This doesn't seem to happen that often though.

>
> Compaction often appears hung when decompressing a very large row, but
> usually not for "hours".
>
>>
>>    - Is it recommended to disable compaction from a certain data size?
>>    (I believe 25GB on each node).
>>
>> It is almost never recommended to disable compaction.
>
>>
>>    - Can I stop this compaction? nodetool stop compaction doesn't seem
>>    to work.
>>
>> Killing the JVM ("the dungeon collapses!") would certainly stop it, but
> it'd likely just start again when you restart the node.
>
>>
>>    - Is stopping the compaction dangerous?
>>
>>  Not if you're in a version that properly cleans up partial compactions,
> which is most of them.
>
>>
>>    - Is killing the cassandra process dangerous while compacting(I did
>>    nodetool drain on one node)?
>>
>> No. But probably nodetool drain couldn't actually stop the in-progress
> compaction either, FWIW.
>
>> This is output of nodetool compactionstats grepped for the keyspace that
>> seems stuck.
>>
>> Do you have gigantic rows in that keyspace? What does cfstats say about
> the largest row compaction has seen/do you have log messages about
> compacting large rows?
>

I don't know about the gigantic rows. How can I check?

I've checked the logs and found this:
 INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
CompactionController.java (line 192) Compacting large row
billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
incrementally
So this is from 6 hours ago.

I also see a lot of messages like this:
INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java (line
58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
And (although it's unrelated this might impact compaction performance?):
 WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
BatchStatement.java (line 223) Batch of prepared statements for
[billing.usage_record_ptd] is of size 13834, exceeding specified threshold
of 5120 by 8714.

It's like the compaction is only doing one sstable at a time and is doing
nothing a long time in between.

cfstats for this keyspace and columnfamily gives the following:
                Table: mycolumnfamily
                SSTable count: 26
                Space used (live), bytes: 319858991
                Space used (total), bytes: 319860267
                SSTable Compression Ratio: 0.24265700071674673
                Number of keys (estimate): 6656
                Memtable cell count: 22710
                Memtable data size, bytes: 3310654
                Memtable switch count: 31
                Local read count: 0
                Local read latency: 0.000 ms
                Local write count: 997667
                Local write latency: 0.000 ms
                Pending tasks: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used, bytes: 12760
                Compacted partition minimum bytes: 1332
                Compacted partition maximum bytes: 43388628
                Compacted partition mean bytes: 234682
                Average live cells per slice (last five minutes): 0.0
                Average tombstones per slice (last five minutes): 0.0


> I also see frequently lines like this in system.log:
>>
>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>> BatchStatement.java (line 223) Batch of prepared statements for 
>> [billing.usage_record_by_billing_period, billing.metric] is of size 53086, 
>> exceeding specified threshold of 5120 by 47966.
>>
>>
> Unrelated.
>
> =Rob
>
>

Can I upgrade to 2.1.11 without doing a nodetool repair/compaction being
stuck?
Another thing to mention is that nodetool repair didn't run yet. It got
installed but nobody bothered to schedule the repair.

Thanks for looking into this!

Reply via email to