> 1. Is it feasible to run directly against a Cassandra data directory restored > from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS > snapshot).
I dont have experience with the EBS snapshot, but I've never been a fan of OS level snapshots that are not coordinated with the DB layer. > 2. Noting the wiki's consistent Cassandra backups advice; if I schedule > nodetool snapshots across the cluster, should the relative age of the > 'sibling' snapshots be a concern? How far apart can they be before its a > problem? (seconds? minutes? hours?) Consider the snapshot to be from the time of the first one. Previous discussion on AWS backup http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23 Jun 2011, at 10:48, Thoku Hansen wrote: > I have a couple of questions regarding the coordination of Cassandra nodetool > snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore > strategy. > > Background: I have a cluster running in EC2. Its nodes are configured like so: > > * Instance type: m1.xlarge > * Cassandra commit log writing to RAID-0 ephemeral storage > * Cassandra data writing to an EBS volume. > > Note: there is a lot of conflicting information/advice about using Cassandra > in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well > for my application. I only described this to provide context for my EBS > snapshotting question. With respect, I hope not to debate Cassandra > performance for ephemeral vs. EBS in this thread! > > I am setting up a process that performs regular EBS (->S3) snapshots for the > purpose of backing up Cassandra plus other data. > I presume this will need to be coordinated with regular Cassandra (nodetool) > snapshots also. > > My questions: > 1. Is it feasible to run directly against a Cassandra data directory restored > from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS > snapshot). > 2. Noting the wiki's consistent Cassandra backups advice; if I schedule > nodetool snapshots across the cluster, should the relative age of the > 'sibling' snapshots be a concern? How far apart can they be before its a > problem? (seconds? minutes? hours?) > > My motivation for these two questions: I'm trying to figure out how much > effort needs to be put into: > * Time-coordinated scheduling of nodetool snapshots across the cluster > * Automation of the process of determining the most appropriate set of > nodetool snapshots to use when restoring a cluster. > > Thanks!