On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . < th3penguinwhispe...@gmail.com> wrote: > > In Opscenter I see one of the nodes is orange. It seems like it's working > on compaction. I used nodetool compactionstats and whenever I did this the > Completed nad percentage stays the same (even with hours in between). > Are you the same person from IRC, or a second report today of compaction hanging in this way?
What version of Cassandra? > I currently don't see cpu load from cassandra on that node. So it seems > stuck (somewhere mid 60%). Also some other nodes have compaction on the > same columnfamily. I don't see any progress. > > WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13,677 > ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compactions > for usage_record_ptd. Probably there is an unusually large row in progress > somewhere. It is also possible that buggy code left some sstables compacting > after it was done with them > > > - How can I assure that nothing is happening? > > Find the thread that is doing compaction and strace it. Generally it is one of the threads with a lower thread priority. Compaction often appears hung when decompressing a very large row, but usually not for "hours". > > - Is it recommended to disable compaction from a certain data size? (I > believe 25GB on each node). > > It is almost never recommended to disable compaction. > > - Can I stop this compaction? nodetool stop compaction doesn't seem to > work. > > Killing the JVM ("the dungeon collapses!") would certainly stop it, but it'd likely just start again when you restart the node. > > - Is stopping the compaction dangerous? > > Not if you're in a version that properly cleans up partial compactions, which is most of them. > > - Is killing the cassandra process dangerous while compacting(I did > nodetool drain on one node)? > > No. But probably nodetool drain couldn't actually stop the in-progress compaction either, FWIW. > This is output of nodetool compactionstats grepped for the keyspace that > seems stuck. > > Do you have gigantic rows in that keyspace? What does cfstats say about the largest row compaction has seen/do you have log messages about compacting large rows? > I also see frequently lines like this in system.log: > > WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 > BatchStatement.java (line 223) Batch of prepared statements for > [billing.usage_record_by_billing_period, billing.metric] is of size 53086, > exceeding specified threshold of 5120 by 47966. > > Unrelated. =Rob