I'm just about to extend my current two node production cluster into five node cluster and I'd like to be sure that my plan is correct.
Currently cluster has two nodes with RF=2. The target is to add four nodes, increase RF to 3 and drop one of the old nodes. My current plan is: 1) Add one node with RF=3 but keep the clients connecting only to the two old nodes. As I'm doing many reads with ConsistencyLevel.ONE, this should prevent the clients getting exceptions about missing keys. 2) Restart both old nodes with configuration that has RF=3. The following inserts should now be propagated to the new 3rd node. 3) Execute "nodetool repair" on the new node. This should result that now all three nodes have all the data. 4) Tell the clients they can now connect also to the new node. 5) Add the three remaining nodes, one at the time and wait that the bootstrapping is completed. Also add the nodes to the client connection list. 6) Execute "nodetool decomission" 7) Execute "nodetool loadbalance" to nodes if needed. Can somebody spot any big problem with the plan? I'm also thinking about the possibility to add one node to another data center which would act as a live backup node. The idea would be that all keys should have a copy in the backup machine. If I'm correct, this can be done with RackAwareStrategy as stated in Operation wiki page. No clients will be doing reads from this backup machine. Is this even possible and if it is, would it be wise or should I just do backups by snapshotting the cluster files as suggested in Operation wiki page? I'm currently using RackUnawareStrategy and I'm not even sure if it can be changed without cluster downtime. - Juho Mäkinen