> There are advantages and disadvantages in both approaches. What are people > doing in their production systems? Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to get things off node.
Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 23/03/2013, at 4:37 AM, Jabbar Azam <aja...@gmail.com> wrote: > Hello, > > I've been experimenting with cassandra for quite a while now. > > It's time for me to look at backups but I'm not sure what the best practice > is. I want to be able to recover the data to a point in time before any user > or software errors. > > We will have two datacentres with 4 servers and RF=3. > > Each datacentre will have at most 1.6 TB(includes replication, LZ4 > compression, using test data) of data. That is ten years of data after which > we will start purging. This amounts to about 400MB of data generation per day. > > I've read about users doing snapshots of individual nodes to S3(Netflix) and > I've read about creating virtual datacentres > (http://www.datastax.com/dev/blog/multi-datacenter-replication) where each > virtual datacentre contains a backup node. > > There are advantages and disadvantages in both approaches. What are people > doing in their production systems? > > > > > -- > Thanks > > Jabbar Azam