There is research into causal consistency and cassandra ( http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html , for example), though you’ll note that it uses a fork ( https://github.com/wlloyd/eiger ) which is unlikely something you’d ever want to consider in production. Let’s pretend like it doesn’t exist, and won’t in the near future.
The typical approach here is to have multiple active datacenters and EACH_QUORUM writes, which gives you the ability to have a full DC failure without impact. This also solves your fail-back problem, because when the primary DC is restored, you simply run a repair. What part of EACH_QUORUM is insufficient for your needs? The failure scenarios when the WAN link breaks and it impacts local writes? Short of that, your ‘occasional snapshots and restore in case of emergency’ is going to be your next-best-thing. From: Philip Persad Reply-To: "user@cassandra.apache.org" Date: Monday, December 14, 2015 at 3:11 PM To: Cassandra Users Subject: Re: Replicating Data Between Separate Data Centres Hi Jim, Thanks for taking the time to answer. By Causal Consistency, what I mean is that I need strict ordering of all related events which might have a causal relationship. For example (albeit slightly contrived), if we are looking at recording an event stream, it is very important that the event creating a user be visible before the event which assigns a permissions to a user. However, I don't care at all about the ordering of the creation of two different users. This is what I mean by Causal Consistency. This reason why LOCAL_QUORUM replication does not work for me, is because, while I can get ordering guarantees about the order in which writes will become visible in the Primary DC, I cannot get those guarantees about the Secondary DC. As a result (to user another slightly contrived example), if a user is created and then takes an action shortly before the failure of the Primary DC, there are four possible situations with respect to what will be visible in the Secondary DC: 1) Both events are visible in the Secondary DC 2) Neither event will be visible in the Secondary DC 3) The creation event is visible in the Secondary DC, but the action event is not 4) The action event is visible Secondary DC, but the creation event is not States 1, 2, and 3 are all acceptable. State 4 is not. However, if I understand Cassandra asynchronous DC replication correctly, I do not believe I get any guarantees that situation 4 will not happen. Eventual Consistency promises to "eventually" settle into State 1. However "eventually" does me very little good if Godzilla steps on my Primary DC. I'm willing to accept loss of data which was created near to a disaster (States 2 and 3), but I cannot accept the inconsistent history of events in State 4. I have a mechanism outside of normal Cassandra replication which can give me the consistency I need. My problem is effectively with setting up a new recovery DC after the failure of the primary. How do I go about getting all of my data into a new, cluster? Thanks, -Phil On Mon, Dec 14, 2015 at 1:06 PM, Jim Ancona <j...@anconafamily.com> wrote: Could you define what you mean by Casual Consistency and explain why you think you won't have that when using LOCAL_QUORUM? I ask because LOCAL_QUORUM and multiple data centers are the way many of us handle DR, so I'd like to understand why it doesn't work for you. I'm afraid I don't understand your scenario. Are you planning on building out a new recovery DC *after* the primary has failed, or keeping two DCs in sync so that you can switch over after a failure? Jim On Mon, Dec 14, 2015 at 2:59 PM, Philip Persad <philip.per...@gmail.com> wrote: Hi, I'm currently looking at Cassandra in the context of Disaster Recovery. I have 2 Data Centres, one is the Primary and the other acts as a Standby. There is a Cassandra cluster in each Data Centre. For the time being I'm running Cassandra 2.0.9. Unfortunately, due to the nature of my data, the consistency levels that I would get out of LOCAL_QUORUM writes followed by asynchronous replication to the secondary data centre are insufficient. In the event of a failure, it is acceptable to lose some data, but I need Casual Consistency to be maintained. Since I don't have the luxury of performing nodetool repairs after Godzilla steps on my primary data centre, I use more strictly ordered means of transporting events between the Data Centres (Kafka for anyone who cares about that detail). What I'm not sure about, is how to go about copying all the data in one Cassandra cluster to a new cluster, either to bring up a new Standby Data Centre or as part of failing back to the Primary after I pick up the pieces. I'm thinking that I should either: 1. Do a snapshot (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html), and then restore that snapshot on my new cluster (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html) 2. Join the new data centre to the existing cluster (https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html). Then separate the two data centres into two individual clusters by doing . . . something??? Does anyone have any advice about how to tackle this problem? Many thanks, -Phil
smime.p7s
Description: S/MIME cryptographic signature