Hi,

CLUSTER SETUP
I'm using Cassandra 2.2.3 running on small private cloud infrastructure (supported by ganeti+KVM). I have an initial Cassandra cluster of 8 nodes and a keyspace with Simple Strategy and Replication Factor 3, which is loaded with 2GBs of data (2GBs * 3rep = ~6GBs data in total). On every Cassandra node I'm running ganglia to collect measurements of various metrics like incoming load, throughput, response latency and CPU usage. Every Cassandra node has 2 vCPUs, 80GB HDD (commitlog NOT in a separate disk) and 6GB RAM.

CLIENTS SETUP
I'm using YCSB benchmark to produce READ load, using CQL clients (v2.1.8) with asynchronous calls. 21 YCSB client threads produce ~780 read requests/second each (~16380 req/sec in total).

EXPERIMENT 1
I'm keeping the load steady to ~16380 r/s and I'm adding 1 node periodically. After adding a new node and after the bootstrapping is over, I expect both the response latency and the CPU usage to decrease, however this is not the case. On every node addition CPU usage increases and resp. latency is either steady or increases too.

MEASUREMENT LOGS
# nodes
        LOAD (req/sec)  THROUGHPUT (req/sec)
        LATENCY (ms)
        CPU (%)
8
        15668,81
        15679,81
        56,09
        86,67
9
        16177,45
        16185,05
        62,96
        88.61
10
        16353,36
        16343,27
        75,22
        89,48
11
        15723,14
        15682,06
        65,84
        90,0
12
        15348,97
        15327,27
        103,13
        90,40


I moved from
8 to 9 nodes after 10 minutes,
9 to 10 after 15 minutes,
10 to 11 after 25 minutes,
11 to 12 after 25 minutes.
The bootstrapping took about 1,5 minutes on every addition. I didn't run nodetool cleanup at all.
The measurements of the table are averages.

EXPERIMENT 2
I have also tried with 7 nodes (up to 12) and lower incoming load ~9600 req/seq. Both latency and CPU usage are kept to the same level no matter the number of nodes (3-4ms latency and 75% CPU load). And I've also run nodetool cleanup, still no decrease.

I've read somewhere that the benefits of the node addition in Cassandra are linear, am I missing something?

Thanks a lot!
Thanasis




Reply via email to