Netflix has also gone down this path, we run a regular full backup to
S3 of a compressed tar, and we have scripts that restore everything
into the right place on a different cluster (it needs the same node
count). We also pick up the SSTables as they are created, and drop
them in S3.

Whatever you do, make sure you have a regular process to restore the
data and verify that it contains what you think it should...

Adrian

On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna
<jeremy.hanna1...@gmail.com> wrote:
> one thing we're looking at doing is watching the cassandra data directory and 
> backing up the sstables to s3 when they are created.  Some guys at simplegeo 
> started tablesnap that does this:
> https://github.com/simplegeo/tablesnap
>
> What it does is for every sstable that is pushed to s3, it also copies a json 
> file with the current files in the directory, so you can know what to restore 
> in that event (as far as I understand).
>
> On Apr 28, 2011, at 2:53 PM, William Oberman wrote:
>
>> Even with N-nodes for redundancy, I still want to have backups.  I'm an 
>> amazon person, so naturally I'm thinking S3.  Reading over the docs, and 
>> messing with nodeutil, it looks like each new snapshot contains the previous 
>> snapshot as a subset (and I've read how cassandra uses hard links to avoid 
>> excessive disk use).  When does that pattern break down?
>>
>> I'm basically debating if I can do a "rsync" like backup, or if I should do 
>> a compressed tar backup.  And I obviously want multiple points in time.  S3 
>> does allow file versioning, if a file or file name is changed/resused over 
>> time (only matters in the rsync case).  My only concerns with compressed 
>> tars is I'll have to have free space to create the archive and I get no 
>> "delta" space savings on the backup (the former is solved by not allowing 
>> the disk space to get so low and/or adding more nodes to bring down the 
>> space, the latter is solved by S3 being really cheap anyways).
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 3030 Penn Avenue., First Floor
>> Pittsburgh, PA 15201
>> (M) 412-480-7835
>> (E) ober...@civicscience.com
>
>

Reply via email to