> Snapshot runs on a local node. How do I ensure I have a 'point in
> time' snapshot of the full cluster ? Do I have to stop the writes on
> the full cluster and then snapshot all the nodes individually ?

You don't. By backing up individual nodes you can do a full cluster
recovery that is eventually consistent in the sense that whatever is
the latest version of some data is propagated (after load-up with
sstables, repair, etc), but you don't get a snapshot.

Note that generally, unless you have some external co-ordination which
is participating in the backup process, a snapshot backup should not
be a desired goal since you are asking for a stricter consistency than
what is available on a live cluster to begin with.

If your goal is not strict consistency but rather the practically
limit the time span over which subsets of the clusters gets restored
to, I'm not aware of an out-of-the-box way to do so. Supporting commit
log archival/offsite shipping would help with something like that (I
think there's a JIRA ticket for it somewhere). It occurs to me that
starting up a cluster with a cut-off time for timestamps in the future
would help here too in a very simple way... will try to file a JIRA
for that.

Without WAL archival you're in a position similar to a regular dump of
your typical non-distributed database (you have a dump from some point
in time), except that the "effective" point in time will vary with row
key and be subject to consistency level concerns.

Depending on your use case, a restore-from-backup after a catastrophic
failure may or may not be a violation of your usual consistency
guarantee. If for example your application relies on a QUORUM write
followed by another QUORUM write to a different row key to avoid
inconsistent data, a restore from backup where the former row key gets
restored to a point in time prior to that of the latter row key may
cause the latter write to become visible even though the former write
is lost.

-- 
/ Peter Schuller

Reply via email to