Re: Cassandra compaction stuck? Should I disable?

PenguinWhispererThe . Tue, 10 Nov 2015 10:48:41 -0800

I also have the following memory usage:
[root@US-BILLINGDSX4 cassandra]# free -m
             total       used       free     shared    buffers     cached
Mem:         12024       9455       2569          0        110       2163
-/+ buffers/cache:       7180       4844
Swap:         2047          0       2047


Still a lot free and a lot of free buffers/cache.

2015-11-10 19:45 GMT+01:00 PenguinWhispererThe . <
[email protected]>:

> Still stuck with this. However I enabled GC logging. This shows the
> following:
>
> [root@myhost cassandra]# tail -f gc-1447180680.log
> 2015-11-10T18:41:45.516+0000: 225.428: [GC 2721842K->2066508K(6209536K),
> 0.0199040 secs]
> 2015-11-10T18:41:45.977+0000: 225.889: [GC 2721868K->2066511K(6209536K),
> 0.0221910 secs]
> 2015-11-10T18:41:46.437+0000: 226.349: [GC 2721871K->2066524K(6209536K),
> 0.0222140 secs]
> 2015-11-10T18:41:46.897+0000: 226.809: [GC 2721884K->2066539K(6209536K),
> 0.0224140 secs]
> 2015-11-10T18:41:47.359+0000: 227.271: [GC 2721899K->2066538K(6209536K),
> 0.0302520 secs]
> 2015-11-10T18:41:47.821+0000: 227.733: [GC 2721898K->2066557K(6209536K),
> 0.0280530 secs]
> 2015-11-10T18:41:48.293+0000: 228.205: [GC 2721917K->2066571K(6209536K),
> 0.0218000 secs]
> 2015-11-10T18:41:48.790+0000: 228.702: [GC 2721931K->2066780K(6209536K),
> 0.0292470 secs]
> 2015-11-10T18:41:49.290+0000: 229.202: [GC 2722140K->2066843K(6209536K),
> 0.0288740 secs]
> 2015-11-10T18:41:49.756+0000: 229.668: [GC 2722203K->2066818K(6209536K),
> 0.0283380 secs]
> 2015-11-10T18:41:50.249+0000: 230.161: [GC 2722178K->2067158K(6209536K),
> 0.0218690 secs]
> 2015-11-10T18:41:50.713+0000: 230.625: [GC 2722518K->2067236K(6209536K),
> 0.0278810 secs]
>
> This is a VM with 12GB of RAM. Highered the HEAP_SIZE to 6GB and
> HEAP_NEWSIZE to 800MB.
>
> Still the same result.
>
> This looks very similar to following issue:
>
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAJ=3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=q3a-_c3rsdd5...@mail.gmail.com%3E
>
> Is the only possibility to upgrade memory? I mean, I can't believe it's
> just loading all it's data in memory. That would require to keep scaling up
> the node to keep it work?
>
>
> 2015-11-10 9:36 GMT+01:00 PenguinWhispererThe . <
> [email protected]>:
>
>> Correction...
>> I was grepping on Segmentation on the strace and it happens a lot.
>>
>> Do I need to run a scrub?
>>
>> 2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . <
>> [email protected]>:
>>
>>> Hi Rob,
>>>
>>> Thanks for your reply.
>>>
>>> 2015-11-09 23:17 GMT+01:00 Robert Coli <[email protected]>:
>>>
>>>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
>>>> [email protected]> wrote:
>>>>>
>>>>> In Opscenter I see one of the nodes is orange. It seems like it's
>>>>> working on compaction. I used nodetool compactionstats and whenever I did
>>>>> this the Completed nad percentage stays the same (even with hours in
>>>>> between).
>>>>>
>>>> Are you the same person from IRC, or a second report today of
>>>> compaction hanging in this way?
>>>>
>>> Same person ;) Just didn't had things to work with from the chat there.
>>> I want to understand the issue more, see what I can tune or fix. I want to
>>> do nodetool repair before upgrading to 2.1.11 but the compaction is
>>> blocking it.
>>>
>>>>
>>>>
>>>>
>>> What version of Cassandra?
>>>>
>>> 2.0.9
>>>
>>>> I currently don't see cpu load from cassandra on that node. So it seems
>>>>> stuck (somewhere mid 60%). Also some other nodes have compaction on the
>>>>> same columnfamily. I don't see any progress.
>>>>>
>>>>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>>>>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress 
>>>>> compactions for usage_record_ptd.  Probably there is an unusually large 
>>>>> row in progress somewhere.  It is also possible that buggy code left some 
>>>>> sstables compacting after it was done with them
>>>>>
>>>>>
>>>>>    - How can I assure that nothing is happening?
>>>>>
>>>>> Find the thread that is doing compaction and strace it. Generally it
>>>> is one of the threads with a lower thread priority.
>>>>
>>>
>>> I have 141 threads. Not sure if that's normal.
>>>
>>> This seems to be the one:
>>>  61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java
>>>
>>> In the strace I see basically this part repeating (with once in a while
>>> the "resource temporarily unavailable"):
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>
>>> But wait!
>>> I also see this:
>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>>>
>>> This doesn't seem to happen that often though.
>>>
>>>>
>>>> Compaction often appears hung when decompressing a very large row, but
>>>> usually not for "hours".
>>>>
>>>>>
>>>>>    - Is it recommended to disable compaction from a certain data
>>>>>    size? (I believe 25GB on each node).
>>>>>
>>>>> It is almost never recommended to disable compaction.
>>>>
>>>>>
>>>>>    - Can I stop this compaction? nodetool stop compaction doesn't
>>>>>    seem to work.
>>>>>
>>>>> Killing the JVM ("the dungeon collapses!") would certainly stop it,
>>>> but it'd likely just start again when you restart the node.
>>>>
>>>>>
>>>>>    - Is stopping the compaction dangerous?
>>>>>
>>>>>  Not if you're in a version that properly cleans up partial
>>>> compactions, which is most of them.
>>>>
>>>>>
>>>>>    - Is killing the cassandra process dangerous while compacting(I
>>>>>    did nodetool drain on one node)?
>>>>>
>>>>> No. But probably nodetool drain couldn't actually stop the in-progress
>>>> compaction either, FWIW.
>>>>
>>>>> This is output of nodetool compactionstats grepped for the keyspace
>>>>> that seems stuck.
>>>>>
>>>>> Do you have gigantic rows in that keyspace? What does cfstats say
>>>> about the largest row compaction has seen/do you have log messages about
>>>> compacting large rows?
>>>>
>>>
>>> I don't know about the gigantic rows. How can I check?
>>>
>>> I've checked the logs and found this:
>>>  INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
>>> CompactionController.java (line 192) Compacting large row
>>> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
>>> incrementally
>>> So this is from 6 hours ago.
>>>
>>> I also see a lot of messages like this:
>>> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java (line
>>> 58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
>>> ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
>>> And (although it's unrelated this might impact compaction performance?):
>>>  WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
>>> BatchStatement.java (line 223) Batch of prepared statements for
>>> [billing.usage_record_ptd] is of size 13834, exceeding specified threshold
>>> of 5120 by 8714.
>>>
>>> It's like the compaction is only doing one sstable at a time and is
>>> doing nothing a long time in between.
>>>
>>> cfstats for this keyspace and columnfamily gives the following:
>>>                 Table: mycolumnfamily
>>>                 SSTable count: 26
>>>                 Space used (live), bytes: 319858991
>>>                 Space used (total), bytes: 319860267
>>>                 SSTable Compression Ratio: 0.24265700071674673
>>>                 Number of keys (estimate): 6656
>>>                 Memtable cell count: 22710
>>>                 Memtable data size, bytes: 3310654
>>>                 Memtable switch count: 31
>>>                 Local read count: 0
>>>                 Local read latency: 0.000 ms
>>>                 Local write count: 997667
>>>                 Local write latency: 0.000 ms
>>>                 Pending tasks: 0
>>>                 Bloom filter false positives: 0
>>>                 Bloom filter false ratio: 0.00000
>>>                 Bloom filter space used, bytes: 12760
>>>                 Compacted partition minimum bytes: 1332
>>>                 Compacted partition maximum bytes: 43388628
>>>                 Compacted partition mean bytes: 234682
>>>                 Average live cells per slice (last five minutes): 0.0
>>>                 Average tombstones per slice (last five minutes): 0.0
>>>
>>>
>>>> I also see frequently lines like this in system.log:
>>>>>
>>>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>>>>> BatchStatement.java (line 223) Batch of prepared statements for 
>>>>> [billing.usage_record_by_billing_period, billing.metric] is of size 
>>>>> 53086, exceeding specified threshold of 5120 by 47966.
>>>>>
>>>>>
>>>> Unrelated.
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction being
>>> stuck?
>>> Another thing to mention is that nodetool repair didn't run yet. It got
>>> installed but nobody bothered to schedule the repair.
>>>
>>> Thanks for looking into this!
>>>
>>
>>
>

Re: Cassandra compaction stuck? Should I disable?

Reply via email to