Does compactionstats shows only stats for completed compactions (100%)? It might be that the compaction is running constantly, over and over again. In that case I need to know what I might be able to do to stop this constant compaction so I can start a nodetool repair.
Note that there is a lot of traffic on this columnfamily so I'm not sure if temporary disabling compaction is an option. The repair will probably take long as well. Sebastian and Rob: do you might have any more ideas about the things I put in this thread? Any help is appreciated! 2015-11-10 20:03 GMT+01:00 PenguinWhispererThe . < th3penguinwhispe...@gmail.com>: > Hi Sebastian, > > Thanks for your response. > > No swap is used. No offense, I just don't see a reason why having swap > would be the issue here. I put swapiness on 1. I also have jna installed. > That should prevent java being swapped out as wel AFAIK. > > > 2015-11-10 19:50 GMT+01:00 Sebastian Estevez < > sebastian.este...@datastax.com>: > >> Turn off Swap. >> >> >> http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k__disable-swap >> >> >> All the best, >> >> >> [image: datastax_logo.png] <http://www.datastax.com/> >> >> Sebastián Estévez >> >> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >> >> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: >> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] >> <https://twitter.com/datastax> [image: g+.png] >> <https://plus.google.com/+Datastax/about> >> <http://feeds.feedburner.com/datastax> >> <http://goog_410786983> >> >> >> <http://www.datastax.com/gartner-magic-quadrant-odbms> >> >> DataStax is the fastest, most scalable distributed database technology, >> delivering Apache Cassandra to the world’s most innovative enterprises. >> Datastax is built to be agile, always-on, and predictably scalable to any >> size. With more than 500 customers in 45 countries, DataStax is the >> database technology and transactional backbone of choice for the worlds >> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >> >> On Tue, Nov 10, 2015 at 1:48 PM, PenguinWhispererThe . < >> th3penguinwhispe...@gmail.com> wrote: >> >>> I also have the following memory usage: >>> [root@US-BILLINGDSX4 cassandra]# free -m >>> total used free shared buffers cached >>> Mem: 12024 9455 2569 0 110 2163 >>> -/+ buffers/cache: 7180 4844 >>> Swap: 2047 0 2047 >>> >>> Still a lot free and a lot of free buffers/cache. >>> >>> 2015-11-10 19:45 GMT+01:00 PenguinWhispererThe . < >>> th3penguinwhispe...@gmail.com>: >>> >>>> Still stuck with this. However I enabled GC logging. This shows the >>>> following: >>>> >>>> [root@myhost cassandra]# tail -f gc-1447180680.log >>>> 2015-11-10T18:41:45.516+0000: 225.428: [GC >>>> 2721842K->2066508K(6209536K), 0.0199040 secs] >>>> 2015-11-10T18:41:45.977+0000: 225.889: [GC >>>> 2721868K->2066511K(6209536K), 0.0221910 secs] >>>> 2015-11-10T18:41:46.437+0000: 226.349: [GC >>>> 2721871K->2066524K(6209536K), 0.0222140 secs] >>>> 2015-11-10T18:41:46.897+0000: 226.809: [GC >>>> 2721884K->2066539K(6209536K), 0.0224140 secs] >>>> 2015-11-10T18:41:47.359+0000: 227.271: [GC >>>> 2721899K->2066538K(6209536K), 0.0302520 secs] >>>> 2015-11-10T18:41:47.821+0000: 227.733: [GC >>>> 2721898K->2066557K(6209536K), 0.0280530 secs] >>>> 2015-11-10T18:41:48.293+0000: 228.205: [GC >>>> 2721917K->2066571K(6209536K), 0.0218000 secs] >>>> 2015-11-10T18:41:48.790+0000: 228.702: [GC >>>> 2721931K->2066780K(6209536K), 0.0292470 secs] >>>> 2015-11-10T18:41:49.290+0000: 229.202: [GC >>>> 2722140K->2066843K(6209536K), 0.0288740 secs] >>>> 2015-11-10T18:41:49.756+0000: 229.668: [GC >>>> 2722203K->2066818K(6209536K), 0.0283380 secs] >>>> 2015-11-10T18:41:50.249+0000: 230.161: [GC >>>> 2722178K->2067158K(6209536K), 0.0218690 secs] >>>> 2015-11-10T18:41:50.713+0000: 230.625: [GC >>>> 2722518K->2067236K(6209536K), 0.0278810 secs] >>>> >>>> This is a VM with 12GB of RAM. Highered the HEAP_SIZE to 6GB and >>>> HEAP_NEWSIZE to 800MB. >>>> >>>> Still the same result. >>>> >>>> This looks very similar to following issue: >>>> >>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAJ=3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=q3a-_c3rsdd5...@mail.gmail.com%3E >>>> >>>> Is the only possibility to upgrade memory? I mean, I can't believe it's >>>> just loading all it's data in memory. That would require to keep scaling up >>>> the node to keep it work? >>>> >>>> >>>> 2015-11-10 9:36 GMT+01:00 PenguinWhispererThe . < >>>> th3penguinwhispe...@gmail.com>: >>>> >>>>> Correction... >>>>> I was grepping on Segmentation on the strace and it happens a lot. >>>>> >>>>> Do I need to run a scrub? >>>>> >>>>> 2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . < >>>>> th3penguinwhispe...@gmail.com>: >>>>> >>>>>> Hi Rob, >>>>>> >>>>>> Thanks for your reply. >>>>>> >>>>>> 2015-11-09 23:17 GMT+01:00 Robert Coli <rc...@eventbrite.com>: >>>>>> >>>>>>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . < >>>>>>> th3penguinwhispe...@gmail.com> wrote: >>>>>>>> >>>>>>>> In Opscenter I see one of the nodes is orange. It seems like it's >>>>>>>> working on compaction. I used nodetool compactionstats and whenever I >>>>>>>> did >>>>>>>> this the Completed nad percentage stays the same (even with hours in >>>>>>>> between). >>>>>>>> >>>>>>> Are you the same person from IRC, or a second report today of >>>>>>> compaction hanging in this way? >>>>>>> >>>>>> Same person ;) Just didn't had things to work with from the chat >>>>>> there. I want to understand the issue more, see what I can tune or fix. I >>>>>> want to do nodetool repair before upgrading to 2.1.11 but the compaction >>>>>> is >>>>>> blocking it. >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> What version of Cassandra? >>>>>>> >>>>>> 2.0.9 >>>>>> >>>>>>> I currently don't see cpu load from cassandra on that node. So it >>>>>>>> seems stuck (somewhere mid 60%). Also some other nodes have compaction >>>>>>>> on >>>>>>>> the same columnfamily. I don't see any progress. >>>>>>>> >>>>>>>> WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 >>>>>>>> ColumnFamilyStore.java (line 2101) Unable to cancel in-progress >>>>>>>> compactions for usage_record_ptd. Probably there is an unusually >>>>>>>> large row in progress somewhere. It is also possible that buggy code >>>>>>>> left some sstables compacting after it was done with them >>>>>>>> >>>>>>>> >>>>>>>> - How can I assure that nothing is happening? >>>>>>>> >>>>>>>> Find the thread that is doing compaction and strace it. Generally >>>>>>> it is one of the threads with a lower thread priority. >>>>>>> >>>>>> >>>>>> I have 141 threads. Not sure if that's normal. >>>>>> >>>>>> This seems to be the one: >>>>>> 61404 cassandr 24 4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java >>>>>> >>>>>> In the strace I see basically this part repeating (with once in a >>>>>> while the "resource temporarily unavailable"): >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1 >>>>>> getpriority(PRIO_PROCESS, 61404) = 16 >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) = -1 EAGAIN >>>>>> (Resource temporarily unavailable) >>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1 >>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) = 0 >>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1 >>>>>> getpriority(PRIO_PROCESS, 61404) = 16 >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) = 1 >>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) = 0 >>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>>> getpriority(PRIO_PROCESS, 61404) = 16 >>>>>> >>>>>> But wait! >>>>>> I also see this: >>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, >>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) = 0 >>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>>> --- SIGSEGV (Segmentation fault) @ 0 (0) --- >>>>>> >>>>>> This doesn't seem to happen that often though. >>>>>> >>>>>>> >>>>>>> Compaction often appears hung when decompressing a very large row, >>>>>>> but usually not for "hours". >>>>>>> >>>>>>>> >>>>>>>> - Is it recommended to disable compaction from a certain data >>>>>>>> size? (I believe 25GB on each node). >>>>>>>> >>>>>>>> It is almost never recommended to disable compaction. >>>>>>> >>>>>>>> >>>>>>>> - Can I stop this compaction? nodetool stop compaction doesn't >>>>>>>> seem to work. >>>>>>>> >>>>>>>> Killing the JVM ("the dungeon collapses!") would certainly stop it, >>>>>>> but it'd likely just start again when you restart the node. >>>>>>> >>>>>>>> >>>>>>>> - Is stopping the compaction dangerous? >>>>>>>> >>>>>>>> Not if you're in a version that properly cleans up partial >>>>>>> compactions, which is most of them. >>>>>>> >>>>>>>> >>>>>>>> - Is killing the cassandra process dangerous while compacting(I >>>>>>>> did nodetool drain on one node)? >>>>>>>> >>>>>>>> No. But probably nodetool drain couldn't actually stop the >>>>>>> in-progress compaction either, FWIW. >>>>>>> >>>>>>>> This is output of nodetool compactionstats grepped for the keyspace >>>>>>>> that seems stuck. >>>>>>>> >>>>>>>> Do you have gigantic rows in that keyspace? What does cfstats say >>>>>>> about the largest row compaction has seen/do you have log messages about >>>>>>> compacting large rows? >>>>>>> >>>>>> >>>>>> I don't know about the gigantic rows. How can I check? >>>>>> >>>>>> I've checked the logs and found this: >>>>>> INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077 >>>>>> CompactionController.java (line 192) Compacting large row >>>>>> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 bytes) >>>>>> incrementally >>>>>> So this is from 6 hours ago. >>>>>> >>>>>> I also see a lot of messages like this: >>>>>> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java >>>>>> (line 58) flushing high-traffic column family CFS(Keyspace='mykeyspace', >>>>>> ColumnFamily='mycolumnfamily') (estimated 100317609 bytes) >>>>>> And (although it's unrelated this might impact compaction >>>>>> performance?): >>>>>> WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172 >>>>>> BatchStatement.java (line 223) Batch of prepared statements for >>>>>> [billing.usage_record_ptd] is of size 13834, exceeding specified >>>>>> threshold >>>>>> of 5120 by 8714. >>>>>> >>>>>> It's like the compaction is only doing one sstable at a time and is >>>>>> doing nothing a long time in between. >>>>>> >>>>>> cfstats for this keyspace and columnfamily gives the following: >>>>>> Table: mycolumnfamily >>>>>> SSTable count: 26 >>>>>> Space used (live), bytes: 319858991 >>>>>> Space used (total), bytes: 319860267 >>>>>> SSTable Compression Ratio: 0.24265700071674673 >>>>>> Number of keys (estimate): 6656 >>>>>> Memtable cell count: 22710 >>>>>> Memtable data size, bytes: 3310654 >>>>>> Memtable switch count: 31 >>>>>> Local read count: 0 >>>>>> Local read latency: 0.000 ms >>>>>> Local write count: 997667 >>>>>> Local write latency: 0.000 ms >>>>>> Pending tasks: 0 >>>>>> Bloom filter false positives: 0 >>>>>> Bloom filter false ratio: 0.00000 >>>>>> Bloom filter space used, bytes: 12760 >>>>>> Compacted partition minimum bytes: 1332 >>>>>> Compacted partition maximum bytes: 43388628 >>>>>> Compacted partition mean bytes: 234682 >>>>>> Average live cells per slice (last five minutes): 0.0 >>>>>> Average tombstones per slice (last five minutes): 0.0 >>>>>> >>>>>> >>>>>>> I also see frequently lines like this in system.log: >>>>>>>> >>>>>>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 >>>>>>>> BatchStatement.java (line 223) Batch of prepared statements for >>>>>>>> [billing.usage_record_by_billing_period, billing.metric] is of size >>>>>>>> 53086, exceeding specified threshold of 5120 by 47966. >>>>>>>> >>>>>>>> >>>>>>> Unrelated. >>>>>>> >>>>>>> =Rob >>>>>>> >>>>>>> >>>>>> >>>>>> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction >>>>>> being stuck? >>>>>> Another thing to mention is that nodetool repair didn't run yet. It >>>>>> got installed but nobody bothered to schedule the repair. >>>>>> >>>>>> Thanks for looking into this! >>>>>> >>>>> >>>>> >>>> >>> >> >