I don't recommend PrintFLSStatistics=1, it makes the gc logs hard to mechanically parse. Because of that, I can't easily tell whether you're in the same situation we found. But just in case, try setting +CMSClassUnloadingEnabled. There's an issue related to JMX in DSE that prevents effective old gen collection in some cases. The flag's low overhead, and very effective if that's your problem too.
Cheers, Lee On Tue, Jan 21, 2014 at 12:02 AM, John Watson <j...@disqus.com> wrote: > Pretty reliable, at some point, nodes will have super long GCs. > Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592 > > Lovely log messages: > > 9030.798: [ParNew (0: promotion failure size = 4194306) (2: > promotion failure size = 4194306) (4: promotion failure size = > 4194306) (promotion failed) > Total time for which application threads were stopped: 23.2659990 seconds > > Full gc.log until just before restarting the node (see another 32s GC > near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9 > > Here's graph of GC time, where we can see a an increase 30 minutes > prior (indicator that the issue will happen soon): > http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png > > Graph of various Heap usage: > http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png > > Running compactions in the same time frame: > http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png > > CPU, IO, ops and latencies: > > https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png > > cfhistograms/cfstats: > https://gist.github.com/dctrwatson/9a08b38d0258ae434b15 > > Cassandra 1.2.13 > Oracle JDK 1.6u45 > > JVM opts: > > MAX_HEAP_SIZE="8G" > HEAP_NEW_SIZE="1536M" > > Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M > Tried default "-XX:SurvivorRatio=8" and "-XX:SurvivorRatio=4" > Tried default "-XX:MaxTenuringThreshold=1" and "-XX:MaxTenuringThreshold=2" > > All still eventually ran into long GC. > > Hardware for all 3 nodes: > > (2) E5520 @ 2.27Ghz (8 cores w/ HT) ["16" cores] > (6) 4GB RAM [24G RAM] > (1) 500GB 7.2k for commitlog > (2) 400G SSD for data (configured as separate data directories) >