> Snapshot runs on a local node. How do I ensure I have a 'point in > time' snapshot of the full cluster ? Do I have to stop the writes on > the full cluster and then snapshot all the nodes individually ?
You don't. By backing up individual nodes you can do a full cluster recovery that is eventually consistent in the sense that whatever is the latest version of some data is propagated (after load-up with sstables, repair, etc), but you don't get a snapshot. Note that generally, unless you have some external co-ordination which is participating in the backup process, a snapshot backup should not be a desired goal since you are asking for a stricter consistency than what is available on a live cluster to begin with. If your goal is not strict consistency but rather the practically limit the time span over which subsets of the clusters gets restored to, I'm not aware of an out-of-the-box way to do so. Supporting commit log archival/offsite shipping would help with something like that (I think there's a JIRA ticket for it somewhere). It occurs to me that starting up a cluster with a cut-off time for timestamps in the future would help here too in a very simple way... will try to file a JIRA for that. Without WAL archival you're in a position similar to a regular dump of your typical non-distributed database (you have a dump from some point in time), except that the "effective" point in time will vary with row key and be subject to consistency level concerns. Depending on your use case, a restore-from-backup after a catastrophic failure may or may not be a violation of your usual consistency guarantee. If for example your application relies on a QUORUM write followed by another QUORUM write to a different row key to avoid inconsistent data, a restore from backup where the former row key gets restored to a point in time prior to that of the latter row key may cause the latter write to become visible even though the former write is lost. -- / Peter Schuller