Good point. When we looked at the EC2 nodes, we measured 120% CPU utilization or so. We interpreted this as a false representation of CPU utilization on a multi-core machine. Our EC2 nodes have 8 virtual cores each.
Maybe Cassandra 0.6.5 is not so good with execution on multi-core systems? On 15.02.2011, at 20:59, Thibaut Britz <thibaut.br...@trendiction.com> wrote: > Cassandra is very CPU hungry so you might be hitting a CPU bottleneck. > What's your CPU usage during these tests? > > > On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems <mar...@klems.eu> wrote: >> Hi there, >> >> we are currently benchmarking a Cassandra 0.6.5 cluster with 3 >> High-Mem Quadruple Extra Large EC2 nodes >> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool >> (replication factor is 3, random partitioner). We assigned 32 GB RAM >> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer. >> We also set the user count to a very large number via ulimit -u >> 999999. >> >> Our goal is to achieve max throughput by increasing YCSB's threadcount >> parameter (i.e. the number of parallel benchmarking client threads). >> However, this does only improve Cassandra throughput for low numbers >> of threads. If we move to higher threadcounts, throughput does not >> increase and even decreases. Do you have any idea why this is >> happening and possibly suggestions how to scale throughput to much >> higher numbers? Why is throughput hitting a wall, anyways? And where >> does the latency/throughput tradeoff come from? >> >> Here is our YCSB configuration: >> recordcount=300000 >> operationcount=1000000 >> workload=com.yahoo.ycsb.workloads.CoreWorkload >> readallfields=true >> readproportion=0.5 >> updateproportion=0.5 >> scanproportion=0 >> insertproportion=0 >> threadcount= 500 >> target = 10000 >> hosts=EC2-1,EC2-2,EC2-3 >> requestdistribution=uniform >> >> These are typical results for threadcount=1: >> Loading workload... >> Starting test. >> 0 sec: 0 operations; >> 10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE >> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03] >> 20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE >> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11] >> >> These are typical results for threadcount=10: >> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE >> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32] >> 20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE >> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37] >> >> These are typical results for threadcount=100: >> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE >> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91] >> 20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE >> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39] >> >> These are typical results for threadcount=500: >> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE >> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19] >> 20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE >> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75] >> >> We never measured more than ~6000 ops/sec. Are there ways to tune >> Cassandra that we are not aware of? We made some modification to the >> Cassandra 0.6.5 core for experimental reasons, so it's not easy to >> switch to 0.7x or 0.8x. However, if this might solve the scaling >> issues, we might consider to port our modifications to a newer >> Cassandra version... >> >> Thanks, >> >> Markus Klems >> >> Karlsruhe Institute of Technology, Germany >>