I am working on finalizing our backup and restore procedures for a cassandra cluster running on EC2. I understand based on the wiki that in order to replace a single node, I don't actually need to put data on that node. I just need to bootstrap the new node into the cluster and it will get data from the other nodes. However, would is speed up the process if that node already has the data from the node it is replacing? Also, what do I do if the entire cluster goes down? I am planning to snapshot the data each night for each node. Should I save the system keyspace snapshots? Is it problematic to bring the cluster back up with new ips on each node, but the same tokens as before?
Lee Parker