Hi guys, we are currently benchmarking various configurations of an EC2-based Cassandra cluster. This is our current setup:
1) 8 nodes where each node is an m1.xlarge EC2 instance 2) Cassandra version 0.6.5 3) Replication Factor = 3 4) this delivers ~7K to 10K ops/sec with 50% GET and 50% INSERT depending on the consistency level We have been benchmarking the cluster with YCSB, while altering the consistency levels ONE, QUORUM, and ALL, ceteris paribus. This works fine if all nodes are alive. Then, we wanted to benchmark the cluster performance behavior when one node goes down. So, we killed one node and tested the cluster with consistency level ONE, which delivered reasonable throughput of multiple thousand ops/sec. Then, we wanted to test QUORUM and ALL. However, when one node is down, the cluster throughput sharply drops to a few operations and then stops responding to the YCSB client if the consistency level of operations in the benchmark is set to QUORUM or ALL. For ALL, this behavior would (kind of) make sense for read requests but we are puzzled that even QUORUM won't work. And for 100% write operations in consistency level ALL it won't work either. Any ideas why the cluster stops responding for QUORUM and ALL? Thanks, Markus