On Thu, Apr 7, 2011 at 2:27 PM, Erik Onnen <eon...@gmail.com> wrote: > 1) Does this seem like a sane amount of garbage (512MB) to generate > when flushing a 64MB table to disk?
Sort of -- that's just about exactly the amount of space you'd expect 64MB of serialized data to take, in memory. (Not very efficient, I know.) So, you would expect that much to be available to GC, after a flush. Also, flush creates a buffer equal to in_memory_compaction_limit. So that will also generate a spike. I think you upgraded from 0.6 -- if the converter turned row size warning limit into i_m_c_l then it could be much larger. Otherwise, not sure why flush would consume that much *extra* though. Smells like something unexpected in the flush code to me. I don't see anything obvious though. SSTableWriter serializes directly to the outputstream without (m)any other allocations. > 2) Is this possibly a case of the MaxTenuringThreshold=1 working > against cassandra? The flush seems to create a lot of garbage very > quickly such that normal CMS isn't even possible. I'm sure there was a > reason to introduce this setting but I'm not sure it's universally > beneficial. Is there any history on the decision to opt for immediate > promotion rather than using an adaptable number of survivor > generations? The history is that, way back in the early days, we used to max it out the other way (MTT=128) but observed behavior is that objects that survive 1 new gen collection are very likely to survive "forever." This fits with what we expect theoretically: read requests and ephemera from write requests will happen in a small number of ms, but memtable data is not GC-able until flush. (Rowcache data of course is effectively unbounded in tenure.) Keeping long-lived data in a survivor space just makes new gen collections take longer since you are copying that data back and forth over and over. (We have advised some read-heavy customers to ramp up to MTT=16, so it's not a hard-and-fast rule, but it still feels like a reasonable starting point to me.) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com