Good point - we plan to do regular testing to restore the cluster. Also we might spin up a snapshot of the cluster for testing as well.
Also I wonder how much time compression will save when it comes to restores. I'll have to run some tests on that. Thanks for posting. Jeremy On Apr 28, 2011, at 4:15 PM, Adrian Cockcroft wrote: > Netflix has also gone down this path, we run a regular full backup to > S3 of a compressed tar, and we have scripts that restore everything > into the right place on a different cluster (it needs the same node > count). We also pick up the SSTables as they are created, and drop > them in S3. > > Whatever you do, make sure you have a regular process to restore the > data and verify that it contains what you think it should... > > Adrian > > On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna > <jeremy.hanna1...@gmail.com> wrote: >> one thing we're looking at doing is watching the cassandra data directory >> and backing up the sstables to s3 when they are created. Some guys at >> simplegeo started tablesnap that does this: >> https://github.com/simplegeo/tablesnap >> >> What it does is for every sstable that is pushed to s3, it also copies a >> json file with the current files in the directory, so you can know what to >> restore in that event (as far as I understand). >> >> On Apr 28, 2011, at 2:53 PM, William Oberman wrote: >> >>> Even with N-nodes for redundancy, I still want to have backups. I'm an >>> amazon person, so naturally I'm thinking S3. Reading over the docs, and >>> messing with nodeutil, it looks like each new snapshot contains the >>> previous snapshot as a subset (and I've read how cassandra uses hard links >>> to avoid excessive disk use). When does that pattern break down? >>> >>> I'm basically debating if I can do a "rsync" like backup, or if I should do >>> a compressed tar backup. And I obviously want multiple points in time. S3 >>> does allow file versioning, if a file or file name is changed/resused over >>> time (only matters in the rsync case). My only concerns with compressed >>> tars is I'll have to have free space to create the archive and I get no >>> "delta" space savings on the backup (the former is solved by not allowing >>> the disk space to get so low and/or adding more nodes to bring down the >>> space, the latter is solved by S3 being really cheap anyways). >>> >>> -- >>> Will Oberman >>> Civic Science, Inc. >>> 3030 Penn Avenue., First Floor >>> Pittsburgh, PA 15201 >>> (M) 412-480-7835 >>> (E) ober...@civicscience.com >> >>