I have a cluster with 3 nodes, the only keyspace is with replication factor of 3, the application read/write UUID-keyed data. I use CQL (casssandra-python), most writes are done by execute_async, most read are done with consistency level of ONE, overall performance in this setup is better than I expected.
Then I test 6-nodes cluster and 9-nodes. The performance (both read and write) was getting worse and worse. Roughly speaking, 6-nodes is about 2~3 times slower than 3-nodes, and 9-nodes is about 5~6 times slower than 3-nodes. All tests were done with same data set, same test program, same client machines, for multiple times. I'm running Cassandra 2.1.2 with default configuration. What I observed, is that with 6-nodes and 9-nodes, the Cassandra servers were doing OK with IO, but CPU utilization was about 60%~70% higher than 3-nodes. I'd like to get suggestion how to troubleshoot this, as this is totally against what I read, that Cassandra is scaled linearly.