Hi, Backup / Restore can be done in distinct ways.
The process is about taking a snapshot of all the nodes, roughly at the same time to backup, and set a new environment based on this for restore. Usually a Cassandra backup is made through 'nodetool snapshot' on each node, then move all the snapshots out of the node to a safe place. Restoring is different if one node was lost or the entire cluster went down. *If one node went down*, I would probably not use the restore, but just have another node replacing the failed node, this way no data gap, all the data should make its way to the node through the streaming process (standard replace node in Cassandra). *If all the nodes data is wrong* due to a user action (typically a bad delete, a big "PEBKAC" of some kind), then it's about stopping all the nodes, cleaning the '/data/ks/table/*', putting latest correct snapshots taken from this node in there and restarting all the nodes. This imply a downtime and a data loss of course. *Also if for some reason old nodes are not accessible *(a major hardware outage), it is good to have the schema (from one node, all the nodes should have the same schema) and snapshots from the old cluster out of the cluster. If old nodes are using the same tokens than previous nodes, distribute SSTable copying data from the old to the new node, picking the right nodes depending on the token they use. If that's not straightforward (using vnodes), then using a SSTable loader is possible, but probably slower, inducing a bigger downtime. Some tools and documentation are helping with this topic: Datastax doc: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupRestore.html Tablesnap / Cassback are old tools that were meant to make backup / restore more easy. I never used them, so I'll let you judge these tools. Finally I had to work on backup / restore recently on AWS, using EBS. The option we took is to snapshot the '/data' volume entirely regularly using AWS snapshots. So we have a copy of each volume, following our backup policy. Snapshot on AWS are incremental, which allow relatively cheap frequent backups, even though AWS snapshots are probably a bit expensive overall. With this technique restore is really straightforward. We create a new node (or use the existing one that is to be restored), also create a new EBS volume from the snapshot taken from the old node, and attach this new volume to a the newly created node. When the nodes starts, it got all the data, *including the system keyspace*. So the node know about the token it owns (even when using vnodes), about the schema, other nodes token ranges etc. We tested this solution quite successfully with Cassandra 2.1, using same DC / cluster name in the 2 clusters but using distinct seeds (actually in our test, the 2 clusters were using 2 distinct networks at not able to talk anyway). Maybe can this latest solution be applied out of AWS, just snapshot the whole '/data' folder, then put it back in the same node after wiping out the previous content, or directly using another node, and then just starting Cassandra. I never read about the technique described above, so even though our tests were successful I encourage you to act carefully if going this path. Maybe should you consider performing tests as well copying the entire Cassandra '/data' folder to a new cluster, configured like the first one, just using distinct seeds and starting the nodes like this. If this works, it would solve all the schema and token range ownership considerations as the token informations for each node are shipped alongside the data, in '/data/system/'. Good luck with this topic, C*heers ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2017-11-22 13:25 GMT+00:00 Akshit Jain <akshit13...@iiitd.ac.in>: > What is the correct process to backup and restore in cassandra? > Should we do backup node by node like first schema backup from all the > nodes then all other stuff? > In restore the schema should be restored on one node or all the nodes > again?It will give Already Exists Exception but still what's the correct > process which is followed in production? >