>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?


Doing replication manually sounds like a recipe for the DC's eventually
getting subtly out of sync with each other.  If a connection goes down
between DC's, and you are taking data at both, how will you catch each
other up?  C* already offers that resolution for you, and you'd have to
work pretty hard to reproduce it for no obvious benefit that I can see.

For minimum effort, definitely rely on Cassandra's well-tested codebase for
this.




On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
> <fsareshw...@quantcast.com> wrote:
> > Each datacenter will have a cassandra cluster with a separate set of
> seeds
> > specific to that datacenter. However, the cluster name will be the same.
> >
> > Question 1: is this enough to guarentee that the three datacenters will
> have
> > distinct cassandra clusters as well? Or will one node in datacenter A
> still
> > somehow be able to join datacenter B's ring.
>
> If they have network connectivity and the same cluster name, they are
> the same logical cluster. However if your nodes share tokens and you
> have auto_bootstrap=yes (the implicit default) the second node you
> attempt to start will refuse to start because you are trying to
> bootstrap it into the range of a live node.
>
> > For now, we are planning on using our own relay mechanism to transfer
> > data changes from one datacenter to another.
>
> Are you planning to use the streaming commitlog functionality for
> this? Not sure how you would capture all changes otherwise, except
> having your app just write the same thing to multiple places? Unless
> data timestamps are identical between clusters, otherwise identical
> data will not merge properly, as cassandra uses data timestamps to
> merge.
>
> > Question 2: is this a sane strategy?
>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
>
> > Question 3: eventually, we want to turn all these cassandra clusters
> into one
> > large multi-datacenter cluster. What's the best practice to do this?
> Should I
> > just add nodes from all datacenters to the list of seeds and let
> cassandra
> > resolve differences? Is there another way I don't know about?
>
> If you are using NetworkTopologyStrategy and have the same cluster
> name for your isolated clusters, all you need to do is :
>
> 1) configure NTS to store replicas on a per-datacenter basis
> 2) ensure that your nodes are in different logical data centers (by
> default, all nodes are in DC1/rack1)
> 3) ensure that clusters are able to reach each other
> 4) ensure that tokens do not overlap between clusters (the common
> technique with manual token assignment is that each node gets a range
> which is off-by-one)
> 5) ensure that all nodes seed lists contain (recommended) 3 seeds from
> each DC
> 6) rolling restart (so the new seed list is picked up)
> 7) repair ("should" only be required if writes have not replicated via
> your out of band mechanism)
>
> Vnodes change the picture slightly because the chance of your clusters
> having conflicting tokens increases with the number of token ranges
> you have.
>
> =Rob
>

Reply via email to