All,

We've been having intermittent long application pauses (version 1.2.8) and
not sure if it's a cassandra bug.  During these pauses, there are dropped
messages in the cassandra log file along with the node seeing other nodes
as down.  We've turned on gc logging and the following is an example of a
long "stopped" or pause event in the gc.log file.

2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application
threads were stopped: 0.091450 seconds
2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application
threads were stopped: 51.8190260 seconds
2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application
threads were stopped: 0.005470 seconds

As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
pause.  There were no GC log events between those 2 log statements.  Since
there's no GC logs in between, something else must be causing the long stop
time to reach a safepoint.

Could there be a Cassandra thread that is taking a long time to reach a
safepoint and what is it trying to do? Along with the node seeing other
nodes as down in the cassandra log file, the StatusLogger shows 1599
Pending in ReadStage and 9 Pending in MutationStage.

There is mention of cassandra batch revoke bias locks as a possible cause
(not GC) via:
http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html

We have JNA, no swap, and the cluster runs fine besides there intermittent
long pause that can cause a node to appear down to other nodes.  Any ideas
as the cause of the long pause above? It seems not related to GC.

thanks.

Reply via email to