> But he's talking about "promotion failed" which is about heap > fragmentation, not "concurrent mode failure" which would indicate CMS > too late. So increasing young generation size + tenuring threshold is > probably the way to go (especially in a read-heavy workload; > increasing tenuring will just mean copying data in memtables around > between survivor spaces for a write-heavy load).
Thanks for the catch. You're right. For interested parties: This caused me to look into when 'promotion failed' and 'concurrent mode failure' are actually reported. WIth some background here (from 2006, so potentially out of date): http://blogs.sun.com/jonthecollector/entry/when_the_sum_of_the I looked at a semi-recent openjdk7 (so it may have changed since 1.6). "concurrent mode failure" seems to be logged in two cases; one is CMSCollector::do_mark_sweep_work(). The other is CMSCollector::acquire_control_and_collect(). The former is called by the latter if it is determined that compaction should happens, which seems to boil down to whether the the incremental collection is "believed" to fail (my source navigation fu failed me and I'm for some reason unable to find the implementation of collection_attempt_is_safe() that applies...). The other concurrent mode failure is if acquire_control_and_collect() determines that one is already in progress. That seems consistent with the blog entry. "promotion failed" seems reported when an actual next_gen->par_promote() call fails for a specific object. So, my reading is that while 'promotion failed' can indeed be an indicator of promotion failure due to fragmentation alone (if a promotion were to fail in spite of there being plenty of free space left), it can also have a cause overlapping with concurrent mode failure in case a young-gen collection was attempted under the belief that there would be enough space - only to then fail. However, given the reported numbers (CMS: 1341669K->1142937K(2428928K)) it does seem clear that finding contiguous free space is indeed the problem. Running with -XX:PrintFLSStatistics=1 may yield interesting results, but of course won't actually help. -- / Peter Schuller