On 1/17/11, Dan Hendry <dan.hendry.j...@gmail.com> wrote:
> Since applying these settings, the one time I saw the same type of behavior
> as before, the following appeared in the GC log.
>
>    (concurrent mode failure): 9418431K->6267709K(11841536K), 26.4973750
> secs] 9777393K->6267709K(12290944K), [CMS Perm : 20477K->20443K(34188K)],
> 26.7595510 secs] [Times: user=31.75 sys=0.00, real=26.76 secs]

The symptoms described in both of your mails which mention
pathological cases sound like your heap may simply be too small for
your actual working set. Compaction triggers extra memory pressure and
you then OOM or the concurrent mark sweep fails and you thrash.

It is also worth noting that  major compaction reduces the
effectiveness of various caches (internal and o/s level), and it is
somewhat likely that your node has internally backed up threadpools
during and immediately after compaction. Some work has been done
recently to improve these characteristics, but I don't think those
changes are in 0.7.0 release.

> INFO [ScheduledTasks:1] 2011-01-17 10:42:15,911 GCInspector.java (line 133) 
> GC for ConcurrentMarkSweep: 45828 ms, 3350764696 reclaimed leaving 9224048472 
> used; max is 12783583232

As the amount of headroom available to the CMS GCer decreases, it
tends to take longer and longer to reclaim less and less memory. 458
seconds to recover 3gb (leaving 9gb heap out of a max heap of 16gb?)
suggests that your working set has put you into this grey area where
it eventually works but sucks really badly. This is just before the
state where it permanently locks up the JVM and/or OOMs.

Have you sized your memtables and caches so that you have meaningful
heap headroom when your caches are full?

=Rob

Reply via email to