What's the version of Java do you use? Can you try reducing NewSize and increasing Old generation? If you are on old version of Java I also recommend upgrading that version.
On Thu, Jan 19, 2012 at 3:27 AM, Rene Kochen <rene.koc...@emea.schange.com> wrote: > Thanks for your comments. The application is indeed suffering from a freezing > Cassandra node. Queries are taking longer than 10 seconds at the moment of a > full garbage collect. > > Here is an example from the logs. I have a three node cluster. At some point > I see on a node the following log: > > 21:53:35,986 InetAddress /172.16.107.46 is now dead. > > On node "172.16.107.46", I see the following: > > 21:53:27.192+0100: 1335393.834: [GC 1335393.834: [ParNew (promotion failed): > 319468K->324959K(345024K), 0.1304456 secs]1335393.964: [CMS: > 6000844K->3298251K(8005248K), 10.8526193 secs] 6310427K->3298251K(8350272K), > [CMS Perm : 26355K->26346K(44268K)], 10.9832679 secs] [Times: user=11.15 > sys=0.03, real=10.98 secs] > 21:53:38,174 GC for ConcurrentMarkSweep: 10856 ms for 1 collections, > 3389079904 used; max is 8550678528 > > I have not yet tested the "XX:+DisableExplicitGC" switch. > > Is the right thing to do to decrease the CMSInitiatingOccupancyFraction > setting? > > Thanks! > > Rene > > -----Original Message----- > From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller > Sent: dinsdag 20 december 2011 6:38 > To: user@cassandra.apache.org > Subject: Re: Garbage collection freezes cassandra node > > I should add: If you are indeed actually pausing due to "promotion > failed" or "concurrent mode failure" (which you will see in the GC log > if you enable it with the options I suggested), the first thing I > would try to mitigate is: > > * Decrease the occupancy trigger (search for "occupancy") of CMS to a > lower percentage, making the concurrent mark phase start earlier. > * Increase heap size significantly (probably not necessary based on > your graph, but for good measure). > > If it then goes away, report back and we can perhaps figure out > details. There are other things that can be done. > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com)