Thanks for all the info, I think I have been able to sort out my issue. The new settings I am using are:
-Xmn512M (Very important I think) -XX:SurvivorRatio=5 (Not very important I think) -XX:MaxTenuringThreshold=5 -XX:ParallelGCThreads=8 -XX:CMSInitiatingOccupancyFraction=75 Since applying these settings, the one time I saw the same type of behavior as before, the following appeared in the GC log. Total time for which application threads were stopped: 0.6830080 seconds 1368.201: [GC 1368.201: [ParNew (promotion failed) Desired survivor size 38338560 bytes, new threshold 1 (max 5) - age 1: 55799736 bytes, 55799736 total : 449408K->449408K(449408K), 0.2618690 secs]1368.463: [CMS1372.459: [CMS-concurrent-mark: 7.930/9.109 secs] [Times: us er=28.31 sys=0.66, real=9.11 secs] (concurrent mode failure): 9418431K->6267709K(11841536K), 26.4973750 secs] 9777393K->6267709K(12290944K), [CMS Perm : 20477K->20443K(34188K)], 26.7595510 secs] [Times: user=31.75 sys=0.00, real=26.76 secs] Total time for which application threads were stopped: 26.7617560 seconds Now, a full stop of the application was what I was seeing extensively before (100-200 times over the course of a major compaction as reported by gossipers on other nodes). I have also just noticed that the previous instability (ie application stops) correlated with the compaction of a few column families characterized by fairly fat rows (10 mb mean size, max sizes 150-200 mb, up to a million+ columns per row). My theory is that each row being compacted with the old settings was being promoted to the old generation, thereby running the heap out of space and causing a stop the world gc. With the new settings, rows being compacted typically remain in the young generation, allowing them to be cleaned up more quickly with less effort on the part of the garbage collector. Does this theory sound reasonable? Answering some of the other questions: > disk bound or CPU bound during compaction? ... Neither (?). Iowait is 10-20%, disk utilization rarely jumps above 60%, CPU %idle is about 60%. I would have said that I was memory bound but now, I think compaction is now bounded by being single threaded. > are you sure you're not swapping a bit? Only if JNA is not doing its job > Number of cores on your system. How busy is the system? 8, load factors typically < 4 so not terribly busy I would say. On Mon, Jan 17, 2011 at 12:58 PM, Peter Schuller < peter.schul...@infidyne.com> wrote: > > very quickly from the young generation to the old generation". > Furthermore, > > the CMSInitiatingOccupancyFraction of 75 (from a JVM default of 68) means > > "start gc in the old generation later", presumably to allow Cassandra to > use > > more of the old generation heap without needlessly trying to free up used > > space (?). Please correct me if I am misinterpreting these settings. > > Note the use of -XX:+UseCMSInitiatingOccupancyOnly which causes the > JVM to always trigger on that occupancy fraction rather than only do > it for the first trigger (or something along those lines) and then > switch to heuristics. Presumably (though I don't specifically know the > history of this particular option being added) it is more important to > avoid doing Full GC:s at all than super-optimally tweaking the trigger > for maximum throughput. > > The heuristics tend to cut it pretty close, and setting a conservative > fixed occupancy trigger probably greatly lessens the chance of falling > back to a full gc in production. > > > One of the issues I have been having is extreme node instability when > > running a major compaction. After 20-30 seconds of operation, the node > > spends 30+ seconds in (what I believe to be) GC. Now I have tried halving > > all memtable thresholds to reduce overall heap memory usage but that has > not > > seemed to help with the instability. After one of these blips, I often > see > > log entries as follows: > > INFO [ScheduledTasks:1] 2011-01-17 10:41:21,961 GCInspector.java (line > 133) > > GC for ParNew: 215 ms, 45084168 reclaimed leaving 11068700368 used; max > is > > 12783583232 > > INFO [ScheduledTasks:1] 2011-01-17 10:41:28,033 GCInspector.java (line > 133) > > GC for ParNew: 234 ms, 40401120 reclaimed leaving 12144504848 used; max > is > > 12783583232 > > INFO [ScheduledTasks:1] 2011-01-17 10:42:15,911 GCInspector.java (line > 133) > > GC for ConcurrentMarkSweep: 45828 ms, 3350764696 reclaimed leaving > > 9224048472 used; max is 12783583232 > > 45 seconds is pretty significant even for a 12 gig heap unless you're > really CPU loaded so that there is heavy contention over the CPU. > While I don't see anything obviously extreme; are you sure you're not > swapping a bit? > > Also, what do you mean by node instability - does it *completely* stop > responding during these periods or does it flap in and out of the > cluster but is still responding? > > Are you nodes disk bound or CPU bound during compaction? > > -- > / Peter Schuller >