Hi there, we are currently benchmarking a Cassandra 0.6.5 cluster with 3 High-Mem Quadruple Extra Large EC2 nodes (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool (replication factor is 3, random partitioner). We assigned 32 GB RAM to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer. We also set the user count to a very large number via ulimit -u 999999.
Our goal is to achieve max throughput by increasing YCSB's threadcount parameter (i.e. the number of parallel benchmarking client threads). However, this does only improve Cassandra throughput for low numbers of threads. If we move to higher threadcounts, throughput does not increase and even decreases. Do you have any idea why this is happening and possibly suggestions how to scale throughput to much higher numbers? Why is throughput hitting a wall, anyways? And where does the latency/throughput tradeoff come from? Here is our YCSB configuration: recordcount=300000 operationcount=1000000 workload=com.yahoo.ycsb.workloads.CoreWorkload readallfields=true readproportion=0.5 updateproportion=0.5 scanproportion=0 insertproportion=0 threadcount= 500 target = 10000 hosts=EC2-1,EC2-2,EC2-3 requestdistribution=uniform These are typical results for threadcount=1: Loading workload... Starting test. 0 sec: 0 operations; 10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03] 20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11] These are typical results for threadcount=10: 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32] 20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37] These are typical results for threadcount=100: 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91] 20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39] These are typical results for threadcount=500: 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19] 20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75] We never measured more than ~6000 ops/sec. Are there ways to tune Cassandra that we are not aware of? We made some modification to the Cassandra 0.6.5 core for experimental reasons, so it's not easy to switch to 0.7x or 0.8x. However, if this might solve the scaling issues, we might consider to port our modifications to a newer Cassandra version... Thanks, Markus Klems Karlsruhe Institute of Technology, Germany