On 1/17/11, Dan Hendry <dan.hendry.j...@gmail.com> wrote: > Since applying these settings, the one time I saw the same type of behavior > as before, the following appeared in the GC log. > > (concurrent mode failure): 9418431K->6267709K(11841536K), 26.4973750 > secs] 9777393K->6267709K(12290944K), [CMS Perm : 20477K->20443K(34188K)], > 26.7595510 secs] [Times: user=31.75 sys=0.00, real=26.76 secs]
The symptoms described in both of your mails which mention pathological cases sound like your heap may simply be too small for your actual working set. Compaction triggers extra memory pressure and you then OOM or the concurrent mark sweep fails and you thrash. It is also worth noting that major compaction reduces the effectiveness of various caches (internal and o/s level), and it is somewhat likely that your node has internally backed up threadpools during and immediately after compaction. Some work has been done recently to improve these characteristics, but I don't think those changes are in 0.7.0 release. > INFO [ScheduledTasks:1] 2011-01-17 10:42:15,911 GCInspector.java (line 133) > GC for ConcurrentMarkSweep: 45828 ms, 3350764696 reclaimed leaving 9224048472 > used; max is 12783583232 As the amount of headroom available to the CMS GCer decreases, it tends to take longer and longer to reclaim less and less memory. 458 seconds to recover 3gb (leaving 9gb heap out of a max heap of 16gb?) suggests that your working set has put you into this grey area where it eventually works but sucks really badly. This is just before the state where it permanently locks up the JVM and/or OOMs. Have you sized your memtables and caches so that you have meaningful heap headroom when your caches are full? =Rob