ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

Markus Klems Mon, 21 Mar 2011 10:53:25 -0700

Hi guys,

we are currently benchmarking various configurations of an EC2-based
Cassandra cluster. This is our current setup:


1) 8 nodes where each node is an m1.xlarge EC2 instance
2) Cassandra version 0.6.5
3) Replication Factor = 3
4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT
depending on the consistency level

We have been benchmarking the cluster with YCSB, while altering the
consistency levels ONE, QUORUM, and ALL, ceteris paribus. This works
fine if all nodes are alive. Then, we wanted to benchmark the cluster
performance behavior when one node goes down. So, we killed one node
and tested the cluster with consistency level ONE, which delivered
reasonable throughput of multiple thousand ops/sec. Then, we wanted to
test QUORUM and ALL. However, when one node is down, the cluster
throughput sharply drops to a few operations and then stops responding
to the YCSB client if the consistency level of operations in the
benchmark is set to QUORUM or ALL. For ALL, this behavior would (kind
of) make sense for read requests but we are puzzled that even QUORUM
won't work. And for 100% write operations in consistency level ALL it
won't work either.

Any ideas why the cluster stops responding for QUORUM and ALL?

Thanks,

Markus

ConsistencyLevel greater ONE + node failure = non-responsive Cassandra 0.6.5 cluster

Reply via email to