Hi Peter, Thanks again for your time and thoughts on this problem.
We think we've got a bit ahead of the problem by just scaling back (quite savagely) on the rate that we try to hit the cluster. Previously, with a surplus of optimism, we were throwing very big Hadoop jobs at Cassandra, including what I understand to be a worst-case usage (random reads). Now we're throttling right back on the number of parallel jobs that we fire from Hadoop, and we're seeing better performance, in terms of the boxes generally staying up as far as nodetool and other interactive sessions are concerned. As discussed, we've adopted quite a number of different approaches with GC - at the moment we've returned to: JVM_OPTS=" \ -ea \ -Xms2G \ -Xmx3G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false" ... which is much closer to the default as shipped - notable change is the heap size, which out of the box comes as 1G. There's some words on the 'Net that - the recent pages on Riptano's site in fact - that strongly encourage scaling left and right, rather than beefing up the boxes - and certainly we're seeing far less bother from GC using a much smaller heap - previously we'd been going up to 16GB, or even higher. This is based on my previous positive experiences of getting better performance from memory hog apps (eg. Java) by giving them more memory. In any case, it seems that using large amounts of memory on EC2 is just asking for trouble. And because it's Amazon, more smaller machines generally works out as the same CPU grunt per dollar, of course .. although the management costs go up. To answer your last question there - we'd been using some pretty beefy EC2 boxes, but now we think we'll head back to the 2-core 7GB medium-ish sized machines I think. All IO still runs like a dog no matter how much money you spend, sadly. cheers, Jedd.