Re: Cassandra compaction stuck? Should I disable?

Sebastian Estevez Wed, 11 Nov 2015 06:42:55 -0800

Use 'nodetool compactionhistory'

all the best,


Sebastián
On Nov 11, 2015 3:23 AM, "PenguinWhispererThe ." <
th3penguinwhispe...@gmail.com> wrote:

> Does compactionstats shows only stats for completed compactions (100%)? It
> might be that the compaction is running constantly, over and over again.
> In that case I need to know what I might be able to do to stop this
> constant compaction so I can start a nodetool repair.
>
> Note that there is a lot of traffic on this columnfamily so I'm not sure
> if temporary disabling compaction is an option. The repair will probably
> take long as well.
>
> Sebastian and Rob: do you might have any more ideas about the things I put
> in this thread? Any help is appreciated!
>
> 2015-11-10 20:03 GMT+01:00 PenguinWhispererThe . <
> th3penguinwhispe...@gmail.com>:
>
>> Hi Sebastian,
>>
>> Thanks for your response.
>>
>> No swap is used. No offense, I just don't see a reason why having swap
>> would be the issue here. I put swapiness on 1. I also have jna installed.
>> That should prevent java being swapped out as wel AFAIK.
>>
>>
>> 2015-11-10 19:50 GMT+01:00 Sebastian Estevez <
>> sebastian.este...@datastax.com>:
>>
>>> Turn off Swap.
>>>
>>>
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k__disable-swap
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>> <http://goog_410786983>
>>>
>>>
>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Nov 10, 2015 at 1:48 PM, PenguinWhispererThe . <
>>> th3penguinwhispe...@gmail.com> wrote:
>>>
>>>> I also have the following memory usage:
>>>> [root@US-BILLINGDSX4 cassandra]# free -m
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:         12024       9455       2569          0        110
>>>> 2163
>>>> -/+ buffers/cache:       7180       4844
>>>> Swap:         2047          0       2047
>>>>
>>>> Still a lot free and a lot of free buffers/cache.
>>>>
>>>> 2015-11-10 19:45 GMT+01:00 PenguinWhispererThe . <
>>>> th3penguinwhispe...@gmail.com>:
>>>>
>>>>> Still stuck with this. However I enabled GC logging. This shows the
>>>>> following:
>>>>>
>>>>> [root@myhost cassandra]# tail -f gc-1447180680.log
>>>>> 2015-11-10T18:41:45.516+0000: 225.428: [GC
>>>>> 2721842K->2066508K(6209536K), 0.0199040 secs]
>>>>> 2015-11-10T18:41:45.977+0000: 225.889: [GC
>>>>> 2721868K->2066511K(6209536K), 0.0221910 secs]
>>>>> 2015-11-10T18:41:46.437+0000: 226.349: [GC
>>>>> 2721871K->2066524K(6209536K), 0.0222140 secs]
>>>>> 2015-11-10T18:41:46.897+0000: 226.809: [GC
>>>>> 2721884K->2066539K(6209536K), 0.0224140 secs]
>>>>> 2015-11-10T18:41:47.359+0000: 227.271: [GC
>>>>> 2721899K->2066538K(6209536K), 0.0302520 secs]
>>>>> 2015-11-10T18:41:47.821+0000: 227.733: [GC
>>>>> 2721898K->2066557K(6209536K), 0.0280530 secs]
>>>>> 2015-11-10T18:41:48.293+0000: 228.205: [GC
>>>>> 2721917K->2066571K(6209536K), 0.0218000 secs]
>>>>> 2015-11-10T18:41:48.790+0000: 228.702: [GC
>>>>> 2721931K->2066780K(6209536K), 0.0292470 secs]
>>>>> 2015-11-10T18:41:49.290+0000: 229.202: [GC
>>>>> 2722140K->2066843K(6209536K), 0.0288740 secs]
>>>>> 2015-11-10T18:41:49.756+0000: 229.668: [GC
>>>>> 2722203K->2066818K(6209536K), 0.0283380 secs]
>>>>> 2015-11-10T18:41:50.249+0000: 230.161: [GC
>>>>> 2722178K->2067158K(6209536K), 0.0218690 secs]
>>>>> 2015-11-10T18:41:50.713+0000: 230.625: [GC
>>>>> 2722518K->2067236K(6209536K), 0.0278810 secs]
>>>>>
>>>>> This is a VM with 12GB of RAM. Highered the HEAP_SIZE to 6GB and
>>>>> HEAP_NEWSIZE to 800MB.
>>>>>
>>>>> Still the same result.
>>>>>
>>>>> This looks very similar to following issue:
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAJ=3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=q3a-_c3rsdd5...@mail.gmail.com%3E
>>>>>
>>>>> Is the only possibility to upgrade memory? I mean, I can't believe
>>>>> it's just loading all it's data in memory. That would require to keep
>>>>> scaling up the node to keep it work?
>>>>>
>>>>>
>>>>> 2015-11-10 9:36 GMT+01:00 PenguinWhispererThe . <
>>>>> th3penguinwhispe...@gmail.com>:
>>>>>
>>>>>> Correction...
>>>>>> I was grepping on Segmentation on the strace and it happens a lot.
>>>>>>
>>>>>> Do I need to run a scrub?
>>>>>>
>>>>>> 2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . <
>>>>>> th3penguinwhispe...@gmail.com>:
>>>>>>
>>>>>>> Hi Rob,
>>>>>>>
>>>>>>> Thanks for your reply.
>>>>>>>
>>>>>>> 2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>:
>>>>>>>
>>>>>>>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
>>>>>>>> th3penguinwhispe...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> In Opscenter I see one of the nodes is orange. It seems like it's
>>>>>>>>> working on compaction. I used nodetool compactionstats and whenever I 
>>>>>>>>> did
>>>>>>>>> this the Completed nad percentage stays the same (even with hours in
>>>>>>>>> between).
>>>>>>>>>
>>>>>>>> Are you the same person from IRC, or a second report today of
>>>>>>>> compaction hanging in this way?
>>>>>>>>
>>>>>>> Same person ;) Just didn't had things to work with from the chat
>>>>>>> there. I want to understand the issue more, see what I can tune or fix. 
>>>>>>> I
>>>>>>> want to do nodetool repair before upgrading to 2.1.11 but the 
>>>>>>> compaction is
>>>>>>> blocking it.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> What version of Cassandra?
>>>>>>>>
>>>>>>> 2.0.9
>>>>>>>
>>>>>>>> I currently don't see cpu load from cassandra on that node. So it
>>>>>>>>> seems stuck (somewhere mid 60%). Also some other nodes have 
>>>>>>>>> compaction on
>>>>>>>>> the same columnfamily. I don't see any progress.
>>>>>>>>>
>>>>>>>>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 
>>>>>>>>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress 
>>>>>>>>> compactions for usage_record_ptd.  Probably there is an unusually 
>>>>>>>>> large row in progress somewhere.  It is also possible that buggy code 
>>>>>>>>> left some sstables compacting after it was done with them
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - How can I assure that nothing is happening?
>>>>>>>>>
>>>>>>>>> Find the thread that is doing compaction and strace it. Generally
>>>>>>>> it is one of the threads with a lower thread priority.
>>>>>>>>
>>>>>>>
>>>>>>> I have 141 threads. Not sure if that's normal.
>>>>>>>
>>>>>>> This seems to be the one:
>>>>>>>  61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java
>>>>>>>
>>>>>>> In the strace I see basically this part repeating (with once in a
>>>>>>> while the "resource temporarily unavailable"):
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN
>>>>>>> (Resource temporarily unavailable)
>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0
>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0
>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>>> getpriority(PRIO_PROCESS, 61404)        = 16
>>>>>>>
>>>>>>> But wait!
>>>>>>> I also see this:
>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0
>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0
>>>>>>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>>>>>>>
>>>>>>> This doesn't seem to happen that often though.
>>>>>>>
>>>>>>>>
>>>>>>>> Compaction often appears hung when decompressing a very large row,
>>>>>>>> but usually not for "hours".
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Is it recommended to disable compaction from a certain data
>>>>>>>>>    size? (I believe 25GB on each node).
>>>>>>>>>
>>>>>>>>> It is almost never recommended to disable compaction.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Can I stop this compaction? nodetool stop compaction doesn't
>>>>>>>>>    seem to work.
>>>>>>>>>
>>>>>>>>> Killing the JVM ("the dungeon collapses!") would certainly stop
>>>>>>>> it, but it'd likely just start again when you restart the node.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Is stopping the compaction dangerous?
>>>>>>>>>
>>>>>>>>>  Not if you're in a version that properly cleans up partial
>>>>>>>> compactions, which is most of them.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Is killing the cassandra process dangerous while
>>>>>>>>>    compacting(I did nodetool drain on one node)?
>>>>>>>>>
>>>>>>>>> No. But probably nodetool drain couldn't actually stop the
>>>>>>>> in-progress compaction either, FWIW.
>>>>>>>>
>>>>>>>>> This is output of nodetool compactionstats grepped for the
>>>>>>>>> keyspace that seems stuck.
>>>>>>>>>
>>>>>>>>> Do you have gigantic rows in that keyspace? What does cfstats say
>>>>>>>> about the largest row compaction has seen/do you have log messages 
>>>>>>>> about
>>>>>>>> compacting large rows?
>>>>>>>>
>>>>>>>
>>>>>>> I don't know about the gigantic rows. How can I check?
>>>>>>>
>>>>>>> I've checked the logs and found this:
>>>>>>>  INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
>>>>>>> CompactionController.java (line 192) Compacting large row
>>>>>>> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes)
>>>>>>> incrementally
>>>>>>> So this is from 6 hours ago.
>>>>>>>
>>>>>>> I also see a lot of messages like this:
>>>>>>> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java
>>>>>>> (line 58) flushing high-traffic column family CFS(Keyspace='mykeyspace',
>>>>>>> ColumnFamily='mycolumnfamily') (estimated 100317609 bytes)
>>>>>>> And (although it's unrelated this might impact compaction
>>>>>>> performance?):
>>>>>>>  WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
>>>>>>> BatchStatement.java (line 223) Batch of prepared statements for
>>>>>>> [billing.usage_record_ptd] is of size 13834, exceeding specified 
>>>>>>> threshold
>>>>>>> of 5120 by 8714.
>>>>>>>
>>>>>>> It's like the compaction is only doing one sstable at a time and is
>>>>>>> doing nothing a long time in between.
>>>>>>>
>>>>>>> cfstats for this keyspace and columnfamily gives the following:
>>>>>>>                 Table: mycolumnfamily
>>>>>>>                 SSTable count: 26
>>>>>>>                 Space used (live), bytes: 319858991
>>>>>>>                 Space used (total), bytes: 319860267
>>>>>>>                 SSTable Compression Ratio: 0.24265700071674673
>>>>>>>                 Number of keys (estimate): 6656
>>>>>>>                 Memtable cell count: 22710
>>>>>>>                 Memtable data size, bytes: 3310654
>>>>>>>                 Memtable switch count: 31
>>>>>>>                 Local read count: 0
>>>>>>>                 Local read latency: 0.000 ms
>>>>>>>                 Local write count: 997667
>>>>>>>                 Local write latency: 0.000 ms
>>>>>>>                 Pending tasks: 0
>>>>>>>                 Bloom filter false positives: 0
>>>>>>>                 Bloom filter false ratio: 0.00000
>>>>>>>                 Bloom filter space used, bytes: 12760
>>>>>>>                 Compacted partition minimum bytes: 1332
>>>>>>>                 Compacted partition maximum bytes: 43388628
>>>>>>>                 Compacted partition mean bytes: 234682
>>>>>>>                 Average live cells per slice (last five minutes): 0.0
>>>>>>>                 Average tombstones per slice (last five minutes): 0.0
>>>>>>>
>>>>>>>
>>>>>>>> I also see frequently lines like this in system.log:
>>>>>>>>>
>>>>>>>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 
>>>>>>>>> BatchStatement.java (line 223) Batch of prepared statements for 
>>>>>>>>> [billing.usage_record_by_billing_period, billing.metric] is of size 
>>>>>>>>> 53086, exceeding specified threshold of 5120 by 47966.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Unrelated.
>>>>>>>>
>>>>>>>> =Rob
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction
>>>>>>> being stuck?
>>>>>>> Another thing to mention is that nodetool repair didn't run yet. It
>>>>>>> got installed but nobody bothered to schedule the repair.
>>>>>>>
>>>>>>> Thanks for looking into this!
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra compaction stuck? Should I disable?

Reply via email to