You should look at this - https://github.com/amorton/cassback i dont believe its setup to use 1.2.10 and above but i believe is just small tweeks to get it running.
Thanks Rahul On Fri, Dec 6, 2013 at 7:09 PM, Michael Theroux <mthero...@yahoo.com> wrote: > Hi Marcelo, > > Cassandra provides and eventually consistent model for backups. You can > do staggered backups of data, with the idea that if you restore a node, and > then do a repair, your data will be once again consistent. Cassandra will > not automatically copy the data to other nodes (other than via hinted > handoff). You should manually run repair after restoring a node. > > You should take snapshots when doing a backup, as it keeps the data you > are backing up relevant to a single point in time, otherwise compaction > could add/delete files one you mid-backup, or worse, I imagine attempt to > access a SSTable mid-write. Snapshots work by using links, and don't take > additional storage to perform. In our process we create the snapshot, > perform the backup, and then clear the snapshot. > > One thing to keep in mind in your S3 cost analysis is that, even though > storage is cheap, reads/writes to S3 are not (especially writes). If you > are using LeveledCompaction, or otherwise have a ton of SSTables, some > people have encountered increased costs moving the data to S3. > > Ourselves, we maintain backup EBS volumes that we regularly snaphot/rsync > data too. Thus far this has worked very well for us. > > -Mike > > > On Friday, December 6, 2013 8:14 AM, Marcelo Elias Del Valle < > marc...@s1mbi0se.com.br> wrote: > Hello everyone, > > I am trying to create backups of my data on AWS. My goal is to store > the backups on S3 or glacier, as it's cheap to store this kind of data. So, > if I have a cluster with N nodes, I would like to copy data from all N > nodes to S3 and be able to restore later. I know Priam does that (we were > using it), but I am using the latest cassandra version and we plan to use > DSE some time, I am not sure Priam fits this case. > I took a look at the docs: > http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html > > And I am trying to understand if it's really needed to take a snapshot > to create my backup. Suppose I do a flush and copy the sstables from each > node, 1 by one, to s3. Not all at the same time, but one by one. > When I try to restore my backup, data from node 1 will be older than > data from node 2. Will this cause problems? AFAIK, if I am using a > replication factor of 2, for instance, and Cassandra sees data from node X > only, it will automatically copy it to other nodes, right? Is there any > chance of cassandra nodes become corrupt somehow if I do my backups this > way? > > Best regards, > Marcelo Valle. > > >