Hello everyone, I'm new to this mailing list, and still fairly new to Cassandra. I'm a systems administrator and have had a 3-node Cassandra cluster with a replication factor of 3 running in Production for about a year now. We have about 200 GB of data per node currently.
Up until recently I have just been performing snapshots and clearing them out as needed. I recently implemented an automated process to perform snapshots of our data and copy them off of our cluster via rsync+ssh. Pretty soon I'll also be utilising the incremental backup feature for sstables (cassandra.yaml:incremental_backups), and will be taking a look at archiving for commitlog as well (commitlog_archiving.properties). I've seen quite a few blog posts here and there about various back up strategies. I'm wondering if anyone on this list would be willing to share theirs. Things I'm curious about: 1. Data size 2. Frequency for full snapshots 3. Frequency for copying snapshots off of the Cassandra nodes 4. Do you use the incremental backups feature 5. Do you use commitlog archiving 6. What method you use to copy data off of the cluster (e.g. NFS, rsync, rsync+ssh, etc) 7. Do you compress your backups, if so how soon (e.g. compress backups older than N days) 8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap, cassandra_snapshotter, etc) 9. Do you utilise AWS for your backups, or do you keep it local (or offsite on your own hardware) 10. Anything else you'd like to add, especially if I missed something important I'm not asking for the best, perfect method for Cassandra backups. I'd just like to see what others are doing and hopefully use some ideas to improve our processes. Thanks in advance for any responses, and sorry for the wall of text. -Gene