Correction...
I was grepping on Segmentation on the strace and it happens a lot.

Do I need to run a scrub?

2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . <
th3penguinwhispe...@gmail.com>:

> Hi Rob,
>
> Thanks for your reply.
>
> 2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>:
>
>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
>> th3penguinwhispe...@gmail.com> wrote:
>>>
>>> In Opscenter I see one of the nodes is orange. It seems like it's
>>> working on compaction. I used nodetool compactionstats and whenever I did
>>> this the Completed nad percentage stays the same (even with hours in
>>> between).
>>>
>> Are you the same person from IRC, or a second report today of compaction
>> hanging in this way?
>>
> Same person ;) Just didn't had things to work with from the chat there. I
> want to understand the issue more, see what I can tune or fix. I want to do
> nodetool repair before upgrading to 2.1.11 but the compaction is blocking
> it.
>
>>
>>
>>
> What version of Cassandra?
>>
> 2.0.9
>
>> I currently don't see cpu load from cassandra on that node. So it seems
>>> stuck (somewhere mid 60%). Also some other nodes have compaction on the
>>> same columnfamily. I don't see any progress.
>>>
>>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions 
>>> for usage_record_ptd.  Probably there is an unusually large row in progress 
>>> somewhere.  It is also possible that buggy code left some sstables 
>>> compacting after it was done with them
>>>
>>>
>>>    - How can I assure that nothing is happening?
>>>
>>> Find the thread that is doing compaction and strace it. Generally it is
>> one of the threads with a lower thread priority.
>>
>
> I have 141 threads. Not sure if that's normal.
>
> This seems to be the one:
>  61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java
>
> In the strace I see basically this part repeating (with once in a while
> the "resource temporarily unavailable"):
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
> getpriority(PRIO_PROCESS, 61404)        = 16
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
> getpriority(PRIO_PROCESS, 61404)        = 16
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
> getpriority(PRIO_PROCESS, 61404)        = 16
>
> But wait!
> I also see this:
> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>
> This doesn't seem to happen that often though.
>
>>
>> Compaction often appears hung when decompressing a very large row, but
>> usually not for "hours".
>>
>>>
>>>    - Is it recommended to disable compaction from a certain data size?
>>>    (I believe 25GB on each node).
>>>
>>> It is almost never recommended to disable compaction.
>>
>>>
>>>    - Can I stop this compaction? nodetool stop compaction doesn't seem
>>>    to work.
>>>
>>> Killing the JVM ("the dungeon collapses!") would certainly stop it, but
>> it'd likely just start again when you restart the node.
>>
>>>
>>>    - Is stopping the compaction dangerous?
>>>
>>>  Not if you're in a version that properly cleans up partial compactions,
>> which is most of them.
>>
>>>
>>>    - Is killing the cassandra process dangerous while compacting(I did
>>>    nodetool drain on one node)?
>>>
>>> No. But probably nodetool drain couldn't actually stop the in-progress
>> compaction either, FWIW.
>>
>>> This is output of nodetool compactionstats grepped for the keyspace that
>>> seems stuck.
>>>
>>> Do you have gigantic rows in that keyspace? What does cfstats say about
>> the largest row compaction has seen/do you have log messages about
>> compacting large rows?
>>
>
> I don't know about the gigantic rows. How can I check?
>
> I've checked the logs and found this:
>  INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
> CompactionController.java (line 192) Compacting large row
> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
> incrementally
> So this is from 6 hours ago.
>
> I also see a lot of messages like this:
> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java (line
> 58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
> ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
> And (although it's unrelated this might impact compaction performance?):
>  WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
> BatchStatement.java (line 223) Batch of prepared statements for
> [billing.usage_record_ptd] is of size 13834, exceeding specified threshold
> of 5120 by 8714.
>
> It's like the compaction is only doing one sstable at a time and is doing
> nothing a long time in between.
>
> cfstats for this keyspace and columnfamily gives the following:
>                 Table: mycolumnfamily
>                 SSTable count: 26
>                 Space used (live), bytes: 319858991
>                 Space used (total), bytes: 319860267
>                 SSTable Compression Ratio: 0.24265700071674673
>                 Number of keys (estimate): 6656
>                 Memtable cell count: 22710
>                 Memtable data size, bytes: 3310654
>                 Memtable switch count: 31
>                 Local read count: 0
>                 Local read latency: 0.000 ms
>                 Local write count: 997667
>                 Local write latency: 0.000 ms
>                 Pending tasks: 0
>                 Bloom filter false positives: 0
>                 Bloom filter false ratio: 0.00000
>                 Bloom filter space used, bytes: 12760
>                 Compacted partition minimum bytes: 1332
>                 Compacted partition maximum bytes: 43388628
>                 Compacted partition mean bytes: 234682
>                 Average live cells per slice (last five minutes): 0.0
>                 Average tombstones per slice (last five minutes): 0.0
>
>
>> I also see frequently lines like this in system.log:
>>>
>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>>> BatchStatement.java (line 223) Batch of prepared statements for 
>>> [billing.usage_record_by_billing_period, billing.metric] is of size 53086, 
>>> exceeding specified threshold of 5120 by 47966.
>>>
>>>
>> Unrelated.
>>
>> =Rob
>>
>>
>
> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction being
> stuck?
> Another thing to mention is that nodetool repair didn't run yet. It got
> installed but nobody bothered to schedule the repair.
>
> Thanks for looking into this!
>

Reply via email to