Re: Cassandra compaction stuck? Should I disable?

PenguinWhispererThe . Wed, 11 Nov 2015 00:24:07 -0800

Does compactionstats shows only stats for completed compactions (100%)? It
might be that the compaction is running constantly, over and over again.
In that case I need to know what I might be able to do to stop this
constant compaction so I can start a nodetool repair.


Note that there is a lot of traffic on this columnfamily so I'm not sure if
temporary disabling compaction is an option. The repair will probably take
long as well.

Sebastian and Rob: do you might have any more ideas about the things I put
in this thread? Any help is appreciated!

2015-11-10 20:03 GMT+01:00 PenguinWhispererThe . <
th3penguinwhispe...@gmail.com>:

> Hi Sebastian,
>
> Thanks for your response.
>
> No swap is used. No offense, I just don't see a reason why having swap
> would be the issue here. I put swapiness on 1. I also have jna installed.
> That should prevent java being swapped out as wel AFAIK.
>
>
> 2015-11-10 19:50 GMT+01:00 Sebastian Estevez <
> sebastian.este...@datastax.com>:
>
>> Turn off Swap.
>>
>>
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k__disable-swap
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>> <http://goog_410786983>
>>
>>
>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Nov 10, 2015 at 1:48 PM, PenguinWhispererThe . <
>> th3penguinwhispe...@gmail.com> wrote:
>>
>>> I also have the following memory usage:
>>> [root@US-BILLINGDSX4 cassandra]# free -m
>>>              total       used       free     shared    buffers     cached
>>> Mem:         12024       9455       2569          0        110       2163
>>> -/+ buffers/cache:       7180       4844
>>> Swap:         2047          0       2047
>>>
>>> Still a lot free and a lot of free buffers/cache.
>>>
>>> 2015-11-10 19:45 GMT+01:00 PenguinWhispererThe . <
>>> th3penguinwhispe...@gmail.com>:
>>>
>>>> Still stuck with this. However I enabled GC logging. This shows the
>>>> following:
>>>>
>>>> [root@myhost cassandra]# tail -f gc-1447180680.log
>>>> 2015-11-10T18:41:45.516+0000: 225.428: [GC
>>>> 2721842K->2066508K(6209536K), 0.0199040 secs]
>>>> 2015-11-10T18:41:45.977+0000: 225.889: [GC
>>>> 2721868K->2066511K(6209536K), 0.0221910 secs]
>>>> 2015-11-10T18:41:46.437+0000: 226.349: [GC
>>>> 2721871K->2066524K(6209536K), 0.0222140 secs]
>>>> 2015-11-10T18:41:46.897+0000: 226.809: [GC
>>>> 2721884K->2066539K(6209536K), 0.0224140 secs]
>>>> 2015-11-10T18:41:47.359+0000: 227.271: [GC
>>>> 2721899K->2066538K(6209536K), 0.0302520 secs]
>>>> 2015-11-10T18:41:47.821+0000: 227.733: [GC
>>>> 2721898K->2066557K(6209536K), 0.0280530 secs]
>>>> 2015-11-10T18:41:48.293+0000: 228.205: [GC
>>>> 2721917K->2066571K(6209536K), 0.0218000 secs]
>>>> 2015-11-10T18:41:48.790+0000: 228.702: [GC
>>>> 2721931K->2066780K(6209536K), 0.0292470 secs]
>>>> 2015-11-10T18:41:49.290+0000: 229.202: [GC
>>>> 2722140K->2066843K(6209536K), 0.0288740 secs]
>>>> 2015-11-10T18:41:49.756+0000: 229.668: [GC
>>>> 2722203K->2066818K(6209536K), 0.0283380 secs]
>>>> 2015-11-10T18:41:50.249+0000: 230.161: [GC
>>>> 2722178K->2067158K(6209536K), 0.0218690 secs]
>>>> 2015-11-10T18:41:50.713+0000: 230.625: [GC
>>>> 2722518K->2067236K(6209536K), 0.0278810 secs]
>>>>
>>>> This is a VM with 12GB of RAM. Highered the HEAP_SIZE to 6GB and
>>>> HEAP_NEWSIZE to 800MB.
>>>>
>>>> Still the same result.
>>>>
>>>> This looks very similar to following issue:
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAJ=3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=q3a-_c3rsdd5...@mail.gmail.com%3E
>>>>
>>>> Is the only possibility to upgrade memory? I mean, I can't believe it's
>>>> just loading all it's data in memory. That would require to keep scaling up
>>>> the node to keep it work?
>>>>
>>>>
>>>> 2015-11-10 9:36 GMT+01:00 PenguinWhispererThe . <
>>>> th3penguinwhispe...@gmail.com>:
>>>>
>>>>> Correction...
>>>>> I was grepping on Segmentation on the strace and it happens a lot.
>>>>>
>>>>> Do I need to run a scrub?
>>>>>
>>>>> 2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . <
>>>>> th3penguinwhispe...@gmail.com>:
>>>>>
>>>>>> Hi Rob,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> 2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>:
>>>>>>
>>>>>>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
>>>>>>> th3penguinwhispe...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> In Opscenter I see one of the nodes is orange. It seems like it's
>>>>>>>> working on compaction. I used nodetool compactionstats and whenever I 
>>>>>>>> did
>>>>>>>> this the Completed nad percentage stays the same (even with hours in
>>>>>>>> between).
>>>>>>>>
>>>>>>> Are you the same person from IRC, or a second report today of
>>>>>>> compaction hanging in this way?
>>>>>>>
>>>>>> Same person ;) Just didn't had things to work with from the chat
>>>>>> there. I want to understand the issue more, see what I can tune or fix. I
>>>>>> want to do nodetool repair before upgrading to 2.1.11 but the compaction 
>>>>>> is
>>>>>> blocking it.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> What version of Cassandra?
>>>>>>>
>>>>>> 2.0.9
>>>>>>
>>>>>>> I currently don't see cpu load from cassandra on that node. So it
>>>>>>>> seems stuck (somewhere mid 60%). Also some other nodes have compaction 
>>>>>>>> on
>>>>>>>> the same columnfamily. I don't see any progress.
>>>>>>>>
>>>>>>>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>>>>>>>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress 
>>>>>>>> compactions for usage_record_ptd.  Probably there is an unusually 
>>>>>>>> large row in progress somewhere.  It is also possible that buggy code 
>>>>>>>> left some sstables compacting after it was done with them
>>>>>>>>
>>>>>>>>
>>>>>>>>    - How can I assure that nothing is happening?
>>>>>>>>
>>>>>>>> Find the thread that is doing compaction and strace it. Generally
>>>>>>> it is one of the threads with a lower thread priority.
>>>>>>>
>>>>>>
>>>>>> I have 141 threads. Not sure if that's normal.
>>>>>>
>>>>>> This seems to be the one:
>>>>>>  61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java
>>>>>>
>>>>>> In the strace I see basically this part repeating (with once in a
>>>>>> while the "resource temporarily unavailable"):
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN
>>>>>> (Resource temporarily unavailable)
>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>>
>>>>>> But wait!
>>>>>> I also see this:
>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>>>>>>
>>>>>> This doesn't seem to happen that often though.
>>>>>>
>>>>>>>
>>>>>>> Compaction often appears hung when decompressing a very large row,
>>>>>>> but usually not for "hours".
>>>>>>>
>>>>>>>>
>>>>>>>>    - Is it recommended to disable compaction from a certain data
>>>>>>>>    size? (I believe 25GB on each node).
>>>>>>>>
>>>>>>>> It is almost never recommended to disable compaction.
>>>>>>>
>>>>>>>>
>>>>>>>>    - Can I stop this compaction? nodetool stop compaction doesn't
>>>>>>>>    seem to work.
>>>>>>>>
>>>>>>>> Killing the JVM ("the dungeon collapses!") would certainly stop it,
>>>>>>> but it'd likely just start again when you restart the node.
>>>>>>>
>>>>>>>>
>>>>>>>>    - Is stopping the compaction dangerous?
>>>>>>>>
>>>>>>>>  Not if you're in a version that properly cleans up partial
>>>>>>> compactions, which is most of them.
>>>>>>>
>>>>>>>>
>>>>>>>>    - Is killing the cassandra process dangerous while compacting(I
>>>>>>>>    did nodetool drain on one node)?
>>>>>>>>
>>>>>>>> No. But probably nodetool drain couldn't actually stop the
>>>>>>> in-progress compaction either, FWIW.
>>>>>>>
>>>>>>>> This is output of nodetool compactionstats grepped for the keyspace
>>>>>>>> that seems stuck.
>>>>>>>>
>>>>>>>> Do you have gigantic rows in that keyspace? What does cfstats say
>>>>>>> about the largest row compaction has seen/do you have log messages about
>>>>>>> compacting large rows?
>>>>>>>
>>>>>>
>>>>>> I don't know about the gigantic rows. How can I check?
>>>>>>
>>>>>> I've checked the logs and found this:
>>>>>>  INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
>>>>>> CompactionController.java (line 192) Compacting large row
>>>>>> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
>>>>>> incrementally
>>>>>> So this is from 6 hours ago.
>>>>>>
>>>>>> I also see a lot of messages like this:
>>>>>> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java
>>>>>> (line 58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
>>>>>> ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
>>>>>> And (although it's unrelated this might impact compaction
>>>>>> performance?):
>>>>>>  WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
>>>>>> BatchStatement.java (line 223) Batch of prepared statements for
>>>>>> [billing.usage_record_ptd] is of size 13834, exceeding specified 
>>>>>> threshold
>>>>>> of 5120 by 8714.
>>>>>>
>>>>>> It's like the compaction is only doing one sstable at a time and is
>>>>>> doing nothing a long time in between.
>>>>>>
>>>>>> cfstats for this keyspace and columnfamily gives the following:
>>>>>>                 Table: mycolumnfamily
>>>>>>                 SSTable count: 26
>>>>>>                 Space used (live), bytes: 319858991
>>>>>>                 Space used (total), bytes: 319860267
>>>>>>                 SSTable Compression Ratio: 0.24265700071674673
>>>>>>                 Number of keys (estimate): 6656
>>>>>>                 Memtable cell count: 22710
>>>>>>                 Memtable data size, bytes: 3310654
>>>>>>                 Memtable switch count: 31
>>>>>>                 Local read count: 0
>>>>>>                 Local read latency: 0.000 ms
>>>>>>                 Local write count: 997667
>>>>>>                 Local write latency: 0.000 ms
>>>>>>                 Pending tasks: 0
>>>>>>                 Bloom filter false positives: 0
>>>>>>                 Bloom filter false ratio: 0.00000
>>>>>>                 Bloom filter space used, bytes: 12760
>>>>>>                 Compacted partition minimum bytes: 1332
>>>>>>                 Compacted partition maximum bytes: 43388628
>>>>>>                 Compacted partition mean bytes: 234682
>>>>>>                 Average live cells per slice (last five minutes): 0.0
>>>>>>                 Average tombstones per slice (last five minutes): 0.0
>>>>>>
>>>>>>
>>>>>>> I also see frequently lines like this in system.log:
>>>>>>>>
>>>>>>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>>>>>>>> BatchStatement.java (line 223) Batch of prepared statements for 
>>>>>>>> [billing.usage_record_by_billing_period, billing.metric] is of size 
>>>>>>>> 53086, exceeding specified threshold of 5120 by 47966.
>>>>>>>>
>>>>>>>>
>>>>>>> Unrelated.
>>>>>>>
>>>>>>> =Rob
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction
>>>>>> being stuck?
>>>>>> Another thing to mention is that nodetool repair didn't run yet. It
>>>>>> got installed but nobody bothered to schedule the repair.
>>>>>>
>>>>>> Thanks for looking into this!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra compaction stuck? Should I disable?

Reply via email to