Re: Backup strategies in a multi DC cluster

aaron morton Tue, 26 Mar 2013 10:51:46 -0700

> Assume you have four nodes and a snapshot is taken.  The following day if a 
> node goes down and data is corrupt through user error then how do you use the 
> previouus nights snapshots? 
> 
Not sure what is corrupt, the snapshot/backup or the data is incorrect through 
application error.

> Would you replace the faulty node first and then restore last nights 
> snapshot?  What happens if you don't have a replacement node? You won't be 
> able to restore last nights snapshot.
> 

You would need to stop the entire cluster, and restore the snapshots on all 
nodes. 

If you restored the snapshot on just one node, new or old HW, it would have 
some data with an older timestamp than the other nodes. Cassandra would see 
this as an inconsistency, that the restored node missed some writes, and 
resolve the consistency be the most recent values. 

> However if a virtual datacenter consisting of a backup node is used then the 
> backup node could be used regardless of the number of nodes in the datacentre.
> 

It depends on the failure scenario and what you are trying to protect against. 

If you have 4 nodes and one node fails the best thing to do is start a new node 
and let cassandra stream the data from the other nodes. The new node could have 
the same token as the previous failed node. So long as the 
/var/lib/cassandra/data/system dir is empty (and the node is not a seed) it 
will join the cluster and ask the others for data. 

If you want to ensure availability then consider bigger clusters, e.g. 6 nodes 
with rf 3 allows you to lose up to 2 nodes and stay up. Or a higher RF. (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

It's tricky to protect agains application error creating bad data using just 
backups. You may need to look at how you can replay events in your system and 
consider which parts of your data model should be directly  mutates and which 
should be indirectly mutated by recording changes in another part of the model. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/03/2013, at 8:19 AM, Jabbar Azam <aja...@gmail.com> wrote:

> Thanks Aaron. I have a hypothetical question.
> 
> Assume you have four nodes and a snapshot is taken.  The following day if a 
> node goes down and data is corrupt through user error then how do you use the 
> previouus nights snapshots? 
> 
> Would you replace the faulty node first and then restore last nights 
> snapshot?  What happens if you don't have a replacement node? You won't be 
> able to restore last nights snapshot.
> 
> However if a virtual datacenter consisting of a backup node is used then the 
> backup node could be used regardless of the number of nodes in the 
> datacentre. Would there be any disadvantages approach?  Sorry for the 
> questions I want to understand all the options.
> 
> On 24 Mar 2013 17:45, "aaron morton" <aa...@thelastpickle.com> wrote:
>> There are advantages and disadvantages in both approaches. What are people 
>> doing in their production systems?
> Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to 
> get things off node. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/03/2013, at 4:37 AM, Jabbar Azam <aja...@gmail.com> wrote:
> 
>> Hello,
>> 
>> I've been experimenting with cassandra for quite a while now.
>> 
>> It's time for me to look at backups but I'm not sure what the best practice 
>> is. I want to be able to recover the data to a point in time before any user 
>> or software errors.
>> 
>> We will have two datacentres with 4 servers and RF=3.
>> 
>> Each datacentre will have at most 1.6 TB(includes replication, LZ4 
>> compression, using test data) of data. That is ten years of data after which 
>> we will start purging. This amounts to about 400MB of data generation per 
>> day.
>> 
>> I've read about users doing snapshots of individual nodes to S3(Netflix) and 
>> I've read  about creating virtual datacentres 
>> (http://www.datastax.com/dev/blog/multi-datacenter-replication) where each 
>> virtual datacentre contains a backup node.
>> 
>> There are advantages and disadvantages in both approaches. What are people 
>> doing in their production systems?
>> 
>> 
>> 
>> 
>> -- 
>> Thanks
>> 
>> Jabbar Azam
>

Re: Backup strategies in a multi DC cluster

Reply via email to