Good point - we plan to do regular testing to restore the cluster.  Also we 
might spin up a snapshot of the cluster for testing as well.

Also I wonder how much time compression will save when it comes to restores.  
I'll have to run some tests on that.  Thanks for posting.


On Apr 28, 2011, at 4:15 PM, Adrian Cockcroft wrote:

> Netflix has also gone down this path, we run a regular full backup to
> S3 of a compressed tar, and we have scripts that restore everything
> into the right place on a different cluster (it needs the same node
> count). We also pick up the SSTables as they are created, and drop
> them in S3.
> Whatever you do, make sure you have a regular process to restore the
> data and verify that it contains what you think it should...
> Adrian
> On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna
> <> wrote:
>> one thing we're looking at doing is watching the cassandra data directory 
>> and backing up the sstables to s3 when they are created.  Some guys at 
>> simplegeo started tablesnap that does this:
>> What it does is for every sstable that is pushed to s3, it also copies a 
>> json file with the current files in the directory, so you can know what to 
>> restore in that event (as far as I understand).
>> On Apr 28, 2011, at 2:53 PM, William Oberman wrote:
>>> Even with N-nodes for redundancy, I still want to have backups.  I'm an 
>>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and 
>>> messing with nodeutil, it looks like each new snapshot contains the 
>>> previous snapshot as a subset (and I've read how cassandra uses hard links 
>>> to avoid excessive disk use).  When does that pattern break down?
>>> I'm basically debating if I can do a "rsync" like backup, or if I should do 
>>> a compressed tar backup.  And I obviously want multiple points in time.  S3 
>>> does allow file versioning, if a file or file name is changed/resused over 
>>> time (only matters in the rsync case).  My only concerns with compressed 
>>> tars is I'll have to have free space to create the archive and I get no 
>>> "delta" space savings on the backup (the former is solved by not allowing 
>>> the disk space to get so low and/or adding more nodes to bring down the 
>>> space, the latter is solved by S3 being really cheap anyways).
>>> --
>>> Will Oberman
>>> Civic Science, Inc.
>>> 3030 Penn Avenue., First Floor
>>> Pittsburgh, PA 15201
>>> (M) 412-480-7835
>>> (E)

Reply via email to