I'll capture what I we're seeing here for anyone else who may look into this in more detail later.
Our standard heap growth is ~300K in between collections with regular ParNew collections happening on average about every 4 seconds. All very healthy. The memtable flush (where we see almost all our CMS activity) seems to have some balloon effect that despite a 64MB memtable size, causes over 512MB heap to be consumed in half a second. In addition to the hefty amount of garbage it causes, due to the MaxTenuringThreshold=1 setting most of that garbage seems to spill immediately into the tenured generation which quickly fills and triggers a CMS. The rate of garbage overflowing to tenured seems to outstrip the speed of the concurrent mark worker which is almost always interrupted and failed to a concurrent collection. However, the tenured collection is usually hugely effective, recovering over half the total heap. Two questions for the group then: 1) Does this seem like a sane amount of garbage (512MB) to generate when flushing a 64MB table to disk? 2) Is this possibly a case of the MaxTenuringThreshold=1 working against cassandra? The flush seems to create a lot of garbage very quickly such that normal CMS isn't even possible. I'm sure there was a reason to introduce this setting but I'm not sure it's universally beneficial. Is there any history on the decision to opt for immediate promotion rather than using an adaptable number of survivor generations?