On Sun, Sep 6, 2015 at 12:32 AM, Gene <gh5...@gmail.com> wrote: > I've seen quite a few blog posts here and there about various back up > strategies. I'm wondering if anyone on this list would be willing to share > theirs. >
https://github.com/JeremyGrosser/tablesnap > Things I'm curious about: > > 1. Data size > Up to hundreds of gigs per node. > 2. Frequency for full snapshots > Never/always (depends on your perspective). > 3. Frequency for copying snapshots off of the Cassandra nodes > As SSTables are flushed. > 4. Do you use the incremental backups feature > No. > 5. Do you use commitlog archiving > No. > 6. What method you use to copy data off of the cluster (e.g. NFS, rsync, > rsync+ssh, etc) > S3 upload. > 7. Do you compress your backups, if so how soon (e.g. compress backups > older than N days) > My SSTables are already snappy compressed, so I am skeptical of benefit from re-compression. > 8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap, > cassandra_snapshotter, etc) > tablesnap > 9. Do you utilise AWS for your backups, or do you keep it local (or > offsite on your own hardware) > AWS. tl;dr - tablesnap works. There are awkward aspects to its use, but if you are operating Cassandra in AWS it's probably the best off the shelf off-node backup.