All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long "stopped" or pause event in the gc.log file.
2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.