Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint.
On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang < shaochuan.w...@bloomreach.com> wrote: > We had similar latency spikes when pending compactions can't keep it up or > repair/streaming taking too much cycles. > > > On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng <fnt...@gmail.com> wrote: > >> All, >> >> We've been having intermittent long application pauses (version 1.2.8) >> and not sure if it's a cassandra bug. During these pauses, there are >> dropped messages in the cassandra log file along with the node seeing other >> nodes as down. We've turned on gc logging and the following is an example >> of a long "stopped" or pause event in the gc.log file. >> >> 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which >> application threads were stopped: 0.091450 seconds >> 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which >> application threads were stopped: 51.8190260 seconds >> 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which >> application threads were stopped: 0.005470 seconds >> >> As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs >> pause. There were no GC log events between those 2 log statements. Since >> there's no GC logs in between, something else must be causing the long stop >> time to reach a safepoint. >> >> Could there be a Cassandra thread that is taking a long time to reach a >> safepoint and what is it trying to do? Along with the node seeing other >> nodes as down in the cassandra log file, the StatusLogger shows 1599 >> Pending in ReadStage and 9 Pending in MutationStage. >> >> There is mention of cassandra batch revoke bias locks as a possible cause >> (not GC) via: >> http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html >> >> We have JNA, no swap, and the cluster runs fine besides there >> intermittent long pause that can cause a node to appear down to other >> nodes. Any ideas as the cause of the long pause above? It seems not >> related to GC. >> >> thanks. >> >> >