Hi Peter,

 Thanks again for your time and thoughts on this problem.

 We think we've got a bit ahead of the problem by just
 scaling back (quite savagely) on the rate that we try to
 hit the cluster.  Previously, with a surplus of optimism,
 we were throwing very big Hadoop jobs at Cassandra,
 including what I understand to be a worst-case usage
 (random reads).

 Now we're throttling right back on the number of parallel
 jobs that we fire from Hadoop, and we're seeing better
 performance, in terms of the boxes generally staying up
 as far as nodetool and other interactive sessions are
 concerned.

 As discussed, we've adopted quite a number of different
 approaches with GC - at the moment we've returned to:

 JVM_OPTS=" \
        -ea \
        -Xms2G \
        -Xmx3G \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:SurvivorRatio=8 \
        -XX:MaxTenuringThreshold=1 \
        -XX:+HeapDumpOnOutOfMemoryError \
        -Dcom.sun.management.jmxremote.port=8080 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"

 ... which is much closer to the default as shipped - notable
 change is the heap size, which out of the box comes as 1G.

 There's some words on the 'Net that - the recent pages on
 Riptano's site in fact - that strongly encourage scaling left
 and right, rather than beefing up the boxes - and certainly
 we're seeing far less bother from GC using a much smaller
 heap - previously we'd been going up to 16GB, or even
 higher.  This is based on my previous positive experiences
 of getting better performance from memory hog apps (eg.
 Java) by giving them more memory.  In any case, it seems
 that using large amounts of memory on EC2 is just asking
 for trouble.

 And because it's Amazon, more smaller machines generally
 works out as the same CPU grunt per dollar, of course ..
 although the management costs go up.

 To answer your last question there - we'd been using some
 pretty beefy EC2 boxes, but now we think we'll head back
 to the 2-core 7GB medium-ish sized machines I think.

 All IO still runs like a dog no matter how much money you
 spend, sadly.

 cheers,
 Jedd.

Reply via email to