> > > I never figured out what kills stdout for C*. It's a library we depend on, > didn't try too hard to figure out which one. >
Nah, it's Cassandra itself (in org.apache.cassandra.service.CassandraDaemon.activate()), but you can pass -f (for 'foreground') to not do it. > > > On 29 January 2014 21:07, Frank Ng <fnt...@gmail.com> wrote: > >> Benedict, >> Thanks for the advice. I've tried turning on PrintSafepointStatistics. >> However, that info is only sent to the STDOUT console. The cassandra >> startup script closes the STDOUT when it finishes, so nothing is shown for >> safepoint statistics once it's done starting up. Do you know how to >> startup cassandra and send all stdout to a log file and tell cassandra not >> to close stdout? >> >> Also, we have swap turned off as recommended. >> >> thanks >> >> >> On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith < >> belliottsm...@datastax.com> wrote: >> >>> Frank, >>> >>> >>> The same advice for investigating holds: add the VM flags >>> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you >>> could put something above 1 there, to reduce the amount of logging, since a >>> pause of 52s will be pretty obvious even if aggregated with lots of other >>> safe points; the count is the number of safepoints to aggregate into one >>> log message) >>> >>> >>> 52s is a very extreme pause, and I would be surprised if revoke bias could >>> cause this. I wonder if the VM is swapping out. >>> >>> >>> >>> On 29 January 2014 19:02, Frank Ng <fnt...@gmail.com> wrote: >>> >>>> Thanks for the update. Our logs indicated that there were 0 pending >>>> for CompactionManager at that time. Also, there were no nodetool repairs >>>> running at that time. The log statements above state that the application >>>> had to stop to reach a safepoint. Yet, it doesn't say what is requesting >>>> the safepoint. >>>> >>>> >>>> On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang < >>>> shaochuan.w...@bloomreach.com> wrote: >>>> >>>>> We had similar latency spikes when pending compactions can't keep it >>>>> up or repair/streaming taking too much cycles. >>>>> >>>>> >>>>> On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng <fnt...@gmail.com> wrote: >>>>> >>>>>> All, >>>>>> >>>>>> We've been having intermittent long application pauses (version >>>>>> 1.2.8) and not sure if it's a cassandra bug. During these pauses, there >>>>>> are dropped messages in the cassandra log file along with the node seeing >>>>>> other nodes as down. We've turned on gc logging and the following is an >>>>>> example of a long "stopped" or pause event in the gc.log file. >>>>>> >>>>>> 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which >>>>>> application threads were stopped: 0.091450 seconds >>>>>> 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which >>>>>> application threads were stopped: 51.8190260 seconds >>>>>> 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which >>>>>> application threads were stopped: 0.005470 seconds >>>>>> >>>>>> As seen above, there was a 0.091450 secs pause, then a 51.8190260 >>>>>> secs pause. There were no GC log events between those 2 log statements. >>>>>> Since there's no GC logs in between, something else must be causing the >>>>>> long stop time to reach a safepoint. >>>>>> >>>>>> Could there be a Cassandra thread that is taking a long time to reach >>>>>> a safepoint and what is it trying to do? Along with the node seeing other >>>>>> nodes as down in the cassandra log file, the StatusLogger shows 1599 >>>>>> Pending in ReadStage and 9 Pending in MutationStage. >>>>>> >>>>>> There is mention of cassandra batch revoke bias locks as a possible >>>>>> cause (not GC) via: >>>>>> http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html >>>>>> >>>>>> We have JNA, no swap, and the cluster runs fine besides there >>>>>> intermittent long pause that can cause a node to appear down to other >>>>>> nodes. Any ideas as the cause of the long pause above? It seems not >>>>>> related to GC. >>>>>> >>>>>> thanks. >>>>>> >>>>>> >>>>> >>>> >>> >> >