Adding a node to cluster keeping 100% data replicated on all nodes

jivko donev Fri, 07 Feb 2014 04:40:02 -0800

Hi,

Our environment will consist of cluster with size not bigger than 2 to 4 nodes 
per cluster(all located in the same DC). We want to ensure that every node in 
the cluster will own 100% of the data. A node adding(or removing) procedure 
will be automated so we want to ensure we're making the right steps. Lets say 
we have node 'A' up and running and want to add another node 'B' to make a 
cluster. Node A configuration will be: 
seed: "IP of A"
listen_address: "IP of A"
num_tokens: 256
rpc_address: 0.0.0.0
The keyspace uses SimpleStrategy with RF: 1.


Adding node 'B' to cluster we are doing the following:
1. Stop cassandra on B.
2. Update cassandra.yaml - change seed to point to "IP of A"
3. Update cassandra-topology.properties - add node A ip to it and make it the 
default one.
4. rm -rf /var/lib/cassandra/*
5. Start cassandra on B.
6. Wait untill nodetool status reports the node B is up.
7. Update RP of the keyspace to 2.
8. Run nodetool repair on B and wait it to finish.

Can we update the RF factor on A before starting Cassandra on B in order to 
skip steps 7 and 8?


Now when the data is sync on both nodes we want to make a node B a seed node.
9. Update seed property on A and B to include the the IP of B node.
10. Restart cassandra on both nodes.

If adding more nodes to the cluster the steps will be the same except that seed 
property will contain all existing nodes in the cluster.

So are these steps everything we need to do? 
Is there anything more we need to do?
Is there an easier way to do what we want or all the steps above are mandatory?

Adding a node to cluster keeping 100% data replicated on all nodes

Reply via email to