Re: Replicating Data Between Separate Data Centres

Philip Persad Mon, 14 Dec 2015 15:37:33 -0800

Hi Jeff,

You're dead on with that article.  That is a very good explanation of the
problem I'm facing.  You're also right that, fascinating though that
research is, letting it anywhere near my production data is not something
I'd think about.


Basically, I want EACH_QUORUM, but I'm not willing to pay for it.  My
system needs to be reasonably close to a real-time system (let's say a soft
real-time system).  Waiting for each write to make its way across a
continent is not something I can live with (to say nothing about what
happens if the WAN temporarily fails).

Basically I guess what I'm hearing is that the best way to create a clone
of a Cassandra cluster in another DC is to snapshot and restore.

Thanks!

-Phil

On Mon, Dec 14, 2015 at 3:18 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

>
> There is research into causal consistency and cassandra (
> http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html
>  ,
> for example), though you’ll note that it uses a fork (
> https://github.com/wlloyd/eiger ) which is unlikely something you’d ever
> want to consider in production. Let’s pretend like it doesn’t exist, and
> won’t in the near future.
>
> The typical approach here is to have multiple active datacenters and
> EACH_QUORUM writes, which gives you the ability to have a full DC failure
> without impact. This also solves your fail-back problem, because when the
> primary DC is restored, you simply run a repair. What part of EACH_QUORUM
> is insufficient for your needs? The failure scenarios when the WAN link
> breaks and it impacts local writes?
>
> Short of that, your ‘occasional snapshots and restore in case of
> emergency’ is going to be your next-best-thing.
>
>
> From: Philip Persad
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, December 14, 2015 at 3:11 PM
> To: Cassandra Users
> Subject: Re: Replicating Data Between Separate Data Centres
>
> Hi Jim,
>
> Thanks for taking the time to answer.  By Causal Consistency, what I mean
> is that I need strict ordering of all related events which might have a
> causal relationship.  For example (albeit slightly contrived), if we are
> looking at recording an event stream, it is very important that the event
> creating a user be visible before the event which assigns a permissions to
> a user.  However, I don't care at all about the ordering of the creation of
> two different users.  This is what I mean by Causal Consistency.
>
> This reason why LOCAL_QUORUM replication does not work for me, is because,
> while I can get ordering guarantees about the order in which writes will
> become visible in the Primary DC, I cannot get those guarantees about the
> Secondary DC.  As a result (to user another slightly contrived example), if
> a user is created and then takes an action shortly before the failure of
> the Primary DC, there are four possible situations with respect to what
> will be visible in the Secondary DC:
>
> 1) Both events are visible in the Secondary DC
> 2) Neither event will be visible in the Secondary DC
> 3) The creation event is visible in the Secondary DC, but the action event
> is not
> 4) The action event is visible Secondary DC, but the creation event is not
>
> States 1, 2, and 3 are all acceptable.  State 4 is not.  However, if I
> understand Cassandra asynchronous DC replication correctly, I do not
> believe I get any guarantees that situation 4 will not happen.  Eventual
> Consistency promises to "eventually" settle into State 1.  However
> "eventually" does me very little good if Godzilla steps on my Primary DC.
> I'm willing to accept loss of data which was created near to a disaster
> (States 2 and 3), but I cannot accept the inconsistent history of events in
> State 4.
>
> I have a mechanism outside of normal Cassandra replication which can give
> me the consistency I need.  My problem is effectively with setting up a new
> recovery DC after the failure of the primary.  How do I go about getting
> all of my data into a new, cluster?
>
> Thanks,
>
> -Phil
>
> On Mon, Dec 14, 2015 at 1:06 PM, Jim Ancona <j...@anconafamily.com> wrote:
>
>> Could you define what you mean by Casual Consistency and explain why you
>> think you won't have that when using LOCAL_QUORUM? I ask because LOCAL_QUORUM
>> and multiple data centers are the way many of us handle DR, so I'd like to
>> understand why it doesn't work for you.
>>
>> I'm afraid I don't understand your scenario. Are you planning on building
>> out a new recovery DC *after* the primary has failed, or keeping two DCs in
>> sync so that you can switch over after a failure?
>>
>> Jim
>>
>> On Mon, Dec 14, 2015 at 2:59 PM, Philip Persad <philip.per...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently looking at Cassandra in the context of Disaster Recovery.
>>> I have 2 Data Centres, one is the Primary and the other acts as a Standby.
>>> There is a Cassandra cluster in each Data Centre.  For the time being I'm
>>> running Cassandra 2.0.9.  Unfortunately, due to the nature of my data, the
>>> consistency levels that I would get out of LOCAL_QUORUM writes followed by
>>> asynchronous replication to the secondary data centre are insufficient.  In
>>> the event of a failure, it is acceptable to lose some data, but I need
>>> Casual Consistency to be maintained.  Since I don't have the luxury of
>>> performing nodetool repairs after Godzilla steps on my primary data centre,
>>> I use more strictly ordered means of transporting events between the Data
>>> Centres (Kafka for anyone who cares about that detail).
>>>
>>> What I'm not sure about, is how to go about copying all the data in one
>>> Cassandra cluster to a new cluster, either to bring up a new Standby Data
>>> Centre or as part of failing back to the Primary after I pick up the
>>> pieces.  I'm thinking that I should either:
>>>
>>> 1. Do a snapshot (
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html),
>>> and then restore that snapshot on my new cluster (
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html
>>> )
>>>
>>> 2. Join the new data centre to the existing cluster (
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html).
>>> Then separate the two data centres into two individual clusters by doing .
>>> . . something???
>>>
>>> Does anyone have any advice about how to tackle this problem?
>>>
>>> Many thanks,
>>>
>>> -Phil
>>>
>>
>>
>

Re: Replicating Data Between Separate Data Centres

Reply via email to