Hi,
CLUSTER SETUP
I'm using Cassandra 2.2.3 running on small private cloud infrastructure
(supported by ganeti+KVM).
I have an initial Cassandra cluster of 8 nodes and a keyspace with
Simple Strategy and Replication Factor 3, which is loaded with 2GBs of
data (2GBs * 3rep = ~6GBs data in total).
On every Cassandra node I'm running ganglia to collect measurements of
various metrics like incoming load, throughput, response latency and CPU
usage.
Every Cassandra node has 2 vCPUs, 80GB HDD (commitlog NOT in a separate
disk) and 6GB RAM.
CLIENTS SETUP
I'm using YCSB benchmark to produce READ load, using CQL clients
(v2.1.8) with asynchronous calls.
21 YCSB client threads produce ~780 read requests/second each (~16380
req/sec in total).
EXPERIMENT 1
I'm keeping the load steady to ~16380 r/s and I'm adding 1 node
periodically. After adding a new node and after the bootstrapping is
over, I expect both the response latency and the CPU usage to decrease,
however this is not the case. On every node addition CPU usage increases
and resp. latency is either steady or increases too.
MEASUREMENT LOGS
# nodes
LOAD (req/sec) THROUGHPUT (req/sec)
LATENCY (ms)
CPU (%)
8
15668,81
15679,81
56,09
86,67
9
16177,45
16185,05
62,96
88.61
10
16353,36
16343,27
75,22
89,48
11
15723,14
15682,06
65,84
90,0
12
15348,97
15327,27
103,13
90,40
I moved from
8 to 9 nodes after 10 minutes,
9 to 10 after 15 minutes,
10 to 11 after 25 minutes,
11 to 12 after 25 minutes.
The bootstrapping took about 1,5 minutes on every addition. I didn't run
nodetool cleanup at all.
The measurements of the table are averages.
EXPERIMENT 2
I have also tried with 7 nodes (up to 12) and lower incoming load ~9600
req/seq. Both latency and CPU usage are kept to the same level no matter
the number of nodes (3-4ms latency and 75% CPU load). And I've also run
nodetool cleanup, still no decrease.
I've read somewhere that the benefits of the node addition in Cassandra
are linear, am I missing something?
Thanks a lot!
Thanasis