Hello Cassandra users+experts, Hopefully someone will be able to point me in the correct direction. We have cassandra 0.6.1 working on our test servers and we *thought* everything was great and ready to move to production. We are currently running a ring of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/) servers on production with a replication factor of 3 and a QUORUM consistency level. We ran a test on 1% of our users, and everything was writing to and reading from cassandra great for the first 3 hours. After that point CPU usage spiked to 100% and stayed there, basically on all 4 machines at once. This smells to me like a GC issue, and I'm looking into it with jconsole right now. If anyone can help me debug this and get cassandra all the way up and running without CPU spiking I would be forever in their debt.
I suspect that anyone else running cassandra on large EC2 instances might just be able to tell me what JVM args they are successfully using in a production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1, and did they go to batched writes due to bug 1014? ( https://issues.apache.org/jira/browse/CASSANDRA-1014) That might answer all my questions. Is there anyone on the list who is using large EC2 instances in production? Would you be kind enough to share your JVM arguments and any other tips? Thanks for any help, Curt -- Curt, ZipZapPlay Inc., www.PlayCrafter.com, http://apps.facebook.com/happyhabitat