> Before I applied these changes the young gen and the survivor space > were very spiky. Now they both seem very low all the time. As you see > from my screen shot, before these changes my JVM memory would make > large saw tooths, now all three pools young, eden, perm seem smoother.
I'm not sure what's going on on Mikio's original graph (why CMS-i would somehow cause lower average memory usage; I think something else is going on there). WIth respect to your graph: http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/tune_that_jvm That just looksl ike a concurrent/mark sweep has not completed at all after the dip after around 14:00. Sooner or later it should either complete a concurrent mark/sweep resulting on a dip, or it should fail to complete it time causing a fallback to a full GC. Unless the CMS-i does something *completely* different that I have completely missed, you definitely should be expecting a sudden dip once a mark/sweep finishes. The minor saw-toothy behavior seen on the slope is mostly going to be dictated by the young generation size chosen by the collector. Possibly it chooses a smaller young generation when cms-i is enabled (speculation). Also, note that lack of saw-toothing is not a goal in and of itself and may even be bad. For example, with respect to the young generation the situation is essentially: (1) The larger the young generation, the more significant the saw-tooth. (2) The larger the young generation, the more efficient the GC (if the application behaves according to the weak generational hypothesis - google it if you want a ref) because less data is promoted to old gen and because the overhead of stop-the-world is lessened. (3) The larger the young generation, the longer the pause times to do collections of the young generation. A few consequences of those include: * Assuming a parallel collection of the young generation, more CPU:s mean that the optimal size of the young generation given a certain pause time goal is higher. In other words, more CPU:s -> more saw tooth. * Lack of saw tooth may just indicate that the majority or of data survives the young generation collections. This is not a good thing; at best it's neutral because the application is simply such that it does not generate a lot of temporary garbage (i.e., it does NOT adher to the weak generational hypothesis). At worst it means GC will be more expensive overall because the "per object" cost of collection the old generation is significantly higher than the "per object" cost of collecting the young generation. That said, if most data is truly very transient in nature, a smaller young generation may still be "big enough". * The previous point highlights the trade-off between low pause times and GC efficiency. One might force a smaller young generation in an attempt to achieve shorter pause times with CMS, but the trade-off is that a larger percentage of allocated data will survive into the old generation and be collected there - more expensively. > I am worried that the cms descriptions talk about systems with 1-2 > processor machines, being my system shows up at 16 processors after > hyper threading. My assumption has been that this recommendation is due to the fact that the more processors you have, the less impact the CMS mark/sweep phase may have on application throughput provided that an appropriate number of threads is selected. So for example, if you have an 8 core machine and have CMS use only a single thread for the mark/sweep phase, the very fact that it is only using 1 out of 8 cores should severely limit its impact. (Of course cache coherency issues presumably negate this somewhat.) Under such circumstances, incremental CMS does not seem worth it. On the other, suppose you're running on a single CPU system. Disregarding CPU cache issues, the concurrent mark/sweep phase would now effectively halve the CPU resources available to the application. A 50% decrease is significant, and under such circumstances the incremental mode is potentially interesting. A trade-off is, presumably (again I don't know a lot about how incremental mode is implemented, but I doubt they've avoided this), is that the total time needed for the mark/sweep onces it does run is higher, such that you retain more floating garbage that might otherwise have been collected. -- / Peter Schuller