If you are thinking about using Amazon S3 storage I wrote a tool that performs snapshots and backups on multiple nodes. Backups are stored compressed on S3. https://github.com/tbarbugli/cassandra_snapshotter
Cheers, Tommaso 2014-05-02 10:42 GMT+02:00 Artur Kronenberg <artur.kronenb...@openmarket.com >: > Hi, > > we are running a 7 node cluster with an RF of 5. Each node holds about 70% > of the data and we are now wondering about the backup process. > > 1. Is there a best practice procedure or a tool that we can use to have > one backup that holds 100 % of the data or is it necessary for us to take > multiple backups. > > 2. If we have to use multiple backups, is there a way to combine them? We > would like to be able to start up a 1 node cluster that holds 100% of data > if necessary. Can we just chug all sstables into the data directory and > cassandra will figure out the rest? > > 3. How do we handle the commitlog files from all of our nodes? Given we'd > like to restore to a certain point in time and we have all the commitlogs, > can we have commitlogs from multiple locations in the commitlog folder and > cassandra will pick and execute the right thing? > > 4. If all of the above would work, could we in case of emergency setup a > massive 1-node cluster that holds 100 % of the data and repair the rest of > our cluster based of this? E.g. have the 1 node run with the correct data, > and then hook it into our existing cluster and call repair on it to restore > data on the rest of our nodes? > > Thanks for your help! > > Cheers, > > Artur >