When planning a DR strategy, which option is going to, most consistently, take the least amount of disk space, be fastest to recover from, least complicated recovery, ect?
I've read through the Operations documents and my take is this so far. If I have specific column families I want to snapshot across the cluster, then sstables2json would make the most sense. However, if I want to back up an individual node(s), so that I can better and more quickly recover from a node failure then snapshots would make more sense? Regularly backing up the data on a large cluster with a high replication factor is redundant, but in a situation where you have an RF <= 2, and are located in a single rack / datacenter, then it might make sense to implement something like this to backup and store data offsite, and I'm trying to figure out what a good, viable, and storage efficient plan would look like. -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*