Hi,
we are running a 7 node cluster with an RF of 5. Each node holds about
70% of the data and we are now wondering about the backup process.
1. Is there a best practice procedure or a tool that we can use to have
one backup that holds 100 % of the data or is it necessary for us to
take multiple backups.
2. If we have to use multiple backups, is there a way to combine them?
We would like to be able to start up a 1 node cluster that holds 100% of
data if necessary. Can we just chug all sstables into the data directory
and cassandra will figure out the rest?
3. How do we handle the commitlog files from all of our nodes? Given
we'd like to restore to a certain point in time and we have all the
commitlogs, can we have commitlogs from multiple locations in the
commitlog folder and cassandra will pick and execute the right thing?
4. If all of the above would work, could we in case of emergency setup a
massive 1-node cluster that holds 100 % of the data and repair the rest
of our cluster based of this? E.g. have the 1 node run with the correct
data, and then hook it into our existing cluster and call repair on it
to restore data on the rest of our nodes?
Thanks for your help!
Cheers,
Artur