> 1. Is it feasible to run directly against a Cassandra data directory restored 
> from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS 
> snapshot).

I dont have experience with the EBS snapshot, but I've never been a fan of OS 
level snapshots that are not coordinated with the DB layer. 

> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule 
> nodetool snapshots across the cluster, should the relative age of the 
> 'sibling' snapshots be a concern? How far apart can they be before its a 
> problem? (seconds? minutes? hours?)

Consider the snapshot to be from the time of the first one. 

Previous discussion on AWS backup 
http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 10:48, Thoku Hansen wrote:

> I have a couple of questions regarding the coordination of Cassandra nodetool 
> snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore 
> strategy.
> 
> Background: I have a cluster running in EC2. Its nodes are configured like so:
> 
> * Instance type: m1.xlarge
> * Cassandra commit log writing to RAID-0 ephemeral storage
> * Cassandra data writing to an EBS volume.
> 
> Note: there is a lot of conflicting information/advice about using Cassandra 
> in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well 
> for my application. I only described this to provide context for my EBS 
> snapshotting question. With respect, I hope not to debate Cassandra 
> performance for ephemeral vs. EBS in this thread!
> 
> I am setting up a process that performs regular EBS (->S3) snapshots for the 
> purpose of backing up Cassandra plus other data.
> I presume this will need to be coordinated with regular Cassandra (nodetool) 
> snapshots also.
> 
> My questions:
> 1. Is it feasible to run directly against a Cassandra data directory restored 
> from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS 
> snapshot).
> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule 
> nodetool snapshots across the cluster, should the relative age of the 
> 'sibling' snapshots be a concern? How far apart can they be before its a 
> problem? (seconds? minutes? hours?)
> 
> My motivation for these two questions: I'm trying to figure out how much 
> effort needs to be put into:
> * Time-coordinated scheduling of nodetool snapshots across the cluster
> * Automation of the process of determining the most appropriate set of 
> nodetool snapshots to use when restoring a cluster.
> 
> Thanks!

Reply via email to