Scaling a cassandra cluster with auto_bootstrap set to false

Markus Klems Thu, 13 Jun 2013 10:48:28 -0700

Hi Cassandra community,

we are currently experimenting with different Cassandra scaling
strategies. We observed that Cassandra performance decreases
drastically when we insert more data into the cluster (say, going from
60GB to 600GB in a 3-node cluster). So we want to find out how to deal
with this problem. One scaling strategy seems interesting but we don't
fully understand what is going on, yet. The strategy works like this:
add new nodes to a Cassandra cluster with "auto_bootstrap = false" to
avoid streaming to the new nodes. We were a bit surprised that this
strategy improved performance considerably and that it worked much
better than other strategies that we tried before, both in terms of
scaling speed and performance impact during scaling.


Let me share our little experiment with you:

In a initial setup S1 we have 4 nodes where each node is similar to
the Amazon EC2 large instance type, i.e., 4 cores, 15GB memory, 700GB
free disk space, Cassandra replication factor 2. Each node is loaded
with 10 million 1KB rows into a single column family, i.e., ~20 GB
data/node, using the Yahoo Cloud Serving Benchmark (YCSB) tool. All
Cassandra settings are default. In the setup S1 we achieved an average
throughput of ~800 ops/s. The workload is a 95/5 read/update mix with
a Zipfian workload distribution (= YCSB workload B).

Setup S2: We then added two empty nodes to our 4-node cluster with
auto_bootstrap set to false. The throughput that we observered
thereafter tripled from 800 ops/s to 2,400 ops/s. We looked at various
outputs from nodetool commands to understand this effect. On the new
nodes, "$ nodetool info" tells us that the keycache is empty; "$
nodetool cfstats" clearly shows write and read requests coming in. The
memtable columns count and data size are multiple times larger
compared to the other four nodes.

We are wondering: what exactly gets stored on the two new nodes in
setup S2 and where (cache, memtable, disk?). Would it be necessary (in
a production environment) to stream the old SSTables from the other
four nodes at some point in time? Or can we simply be happy with the
performance improvement and leave it like this? Are we missing
something here; can you advise us to look at specific monitoring data
to better understand the observed effect?

Thanks,

Markus Klems

Scaling a cassandra cluster with auto_bootstrap set to false

Reply via email to