Thanks, Dan: Yes, -Xmn512MB/1G sizes the Young Generation explicitly and removes the adaptive resizing out of the picture. (If at all possible send your gc log over & we can analyze the promotion failure a little bit more finely.) The low load implies that that you are able to use the parallel threads effectively.
cheers, Sri On Mon, Jan 17, 2011 at 9:05 PM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: > Thanks for all the info, I think I have been able to sort out my issue. The > new settings I am using are: > > -Xmn512M (Very important I think) > -XX:SurvivorRatio=5 (Not very important I think) > -XX:MaxTenuringThreshold=5 > -XX:ParallelGCThreads=8 > -XX:CMSInitiatingOccupancyFraction=75 > > Since applying these settings, the one time I saw the same type of behavior > as before, the following appeared in the GC log. > > Total time for which application threads were stopped: 0.6830080 seconds > 1368.201: [GC 1368.201: [ParNew (promotion failed) > Desired survivor size 38338560 bytes, new threshold 1 (max 5) > - age 1: 55799736 bytes, 55799736 total > : 449408K->449408K(449408K), 0.2618690 secs]1368.463: [CMS1372.459: > [CMS-concurrent-mark: 7.930/9.109 secs] [Times: us > er=28.31 sys=0.66, real=9.11 secs] > (concurrent mode failure): 9418431K->6267709K(11841536K), 26.4973750 > secs] 9777393K->6267709K(12290944K), [CMS Perm : 20477K->20443K(34188K)], > 26.7595510 secs] [Times: user=31.75 sys=0.00, real=26.76 secs] > Total time for which application threads were stopped: 26.7617560 seconds > > Now, a full stop of the application was what I was seeing extensively > before (100-200 times over the course of a major compaction as reported by > gossipers on other nodes). I have also just noticed that the previous > instability (ie application stops) correlated with the compaction of a few > column families characterized by fairly fat rows (10 mb mean size, max sizes > 150-200 mb, up to a million+ columns per row). My theory is that each row > being compacted with the old settings was being promoted to the old > generation, thereby running the heap out of space and causing a stop the > world gc. With the new settings, rows being compacted typically remain in > the young generation, allowing them to be cleaned up more quickly with less > effort on the part of the garbage collector. Does this theory sound > reasonable? > > Answering some of the other questions: > > > disk bound or CPU bound during compaction? > > ... Neither (?). Iowait is 10-20%, disk utilization rarely jumps above 60%, > CPU %idle is about 60%. I would have said that I was memory bound but now, I > think compaction is now bounded by being single threaded. > > > are you sure you're not swapping a bit? > > Only if JNA is not doing its job > > > Number of cores on your system. How busy is the system? > > 8, load factors typically < 4 so not terribly busy I would say. > > On Mon, Jan 17, 2011 at 12:58 PM, Peter Schuller < > peter.schul...@infidyne.com> wrote: > >> > very quickly from the young generation to the old generation". >> Furthermore, >> > the CMSInitiatingOccupancyFraction of 75 (from a JVM default of 68) >> means >> > "start gc in the old generation later", presumably to allow Cassandra to >> use >> > more of the old generation heap without needlessly trying to free up >> used >> > space (?). Please correct me if I am misinterpreting these settings. >> >> Note the use of -XX:+UseCMSInitiatingOccupancyOnly which causes the >> JVM to always trigger on that occupancy fraction rather than only do >> it for the first trigger (or something along those lines) and then >> switch to heuristics. Presumably (though I don't specifically know the >> history of this particular option being added) it is more important to >> avoid doing Full GC:s at all than super-optimally tweaking the trigger >> for maximum throughput. >> >> The heuristics tend to cut it pretty close, and setting a conservative >> fixed occupancy trigger probably greatly lessens the chance of falling >> back to a full gc in production. >> >> > One of the issues I have been having is extreme node instability when >> > running a major compaction. After 20-30 seconds of operation, the node >> > spends 30+ seconds in (what I believe to be) GC. Now I have tried >> halving >> > all memtable thresholds to reduce overall heap memory usage but that has >> not >> > seemed to help with the instability. After one of these blips, I often >> see >> > log entries as follows: >> > INFO [ScheduledTasks:1] 2011-01-17 10:41:21,961 GCInspector.java (line >> 133) >> > GC for ParNew: 215 ms, 45084168 reclaimed leaving 11068700368 used; max >> is >> > 12783583232 >> > INFO [ScheduledTasks:1] 2011-01-17 10:41:28,033 GCInspector.java (line >> 133) >> > GC for ParNew: 234 ms, 40401120 reclaimed leaving 12144504848 used; max >> is >> > 12783583232 >> > INFO [ScheduledTasks:1] 2011-01-17 10:42:15,911 GCInspector.java (line >> 133) >> > GC for ConcurrentMarkSweep: 45828 ms, 3350764696 reclaimed leaving >> > 9224048472 used; max is 12783583232 >> >> 45 seconds is pretty significant even for a 12 gig heap unless you're >> really CPU loaded so that there is heavy contention over the CPU. >> While I don't see anything obviously extreme; are you sure you're not >> swapping a bit? >> >> Also, what do you mean by node instability - does it *completely* stop >> responding during these periods or does it flap in and out of the >> cluster but is still responding? >> >> Are you nodes disk bound or CPU bound during compaction? >> >> -- >> / Peter Schuller >> > >