Netflix has also gone down this path, we run a regular full backup to S3 of a compressed tar, and we have scripts that restore everything into the right place on a different cluster (it needs the same node count). We also pick up the SSTables as they are created, and drop them in S3.
Whatever you do, make sure you have a regular process to restore the data and verify that it contains what you think it should... Adrian On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote: > one thing we're looking at doing is watching the cassandra data directory and > backing up the sstables to s3 when they are created. Some guys at simplegeo > started tablesnap that does this: > https://github.com/simplegeo/tablesnap > > What it does is for every sstable that is pushed to s3, it also copies a json > file with the current files in the directory, so you can know what to restore > in that event (as far as I understand). > > On Apr 28, 2011, at 2:53 PM, William Oberman wrote: > >> Even with N-nodes for redundancy, I still want to have backups. I'm an >> amazon person, so naturally I'm thinking S3. Reading over the docs, and >> messing with nodeutil, it looks like each new snapshot contains the previous >> snapshot as a subset (and I've read how cassandra uses hard links to avoid >> excessive disk use). When does that pattern break down? >> >> I'm basically debating if I can do a "rsync" like backup, or if I should do >> a compressed tar backup. And I obviously want multiple points in time. S3 >> does allow file versioning, if a file or file name is changed/resused over >> time (only matters in the rsync case). My only concerns with compressed >> tars is I'll have to have free space to create the archive and I get no >> "delta" space savings on the backup (the former is solved by not allowing >> the disk space to get so low and/or adding more nodes to bring down the >> space, the latter is solved by S3 being really cheap anyways). >> >> -- >> Will Oberman >> Civic Science, Inc. >> 3030 Penn Avenue., First Floor >> Pittsburgh, PA 15201 >> (M) 412-480-7835 >> (E) ober...@civicscience.com > >