On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
<fsareshw...@quantcast.com> wrote:
> Each datacenter will have a cassandra cluster with a separate set of seeds
> specific to that datacenter. However, the cluster name will be the same.
>
> Question 1: is this enough to guarentee that the three datacenters will have
> distinct cassandra clusters as well? Or will one node in datacenter A still
> somehow be able to join datacenter B's ring.

If they have network connectivity and the same cluster name, they are
the same logical cluster. However if your nodes share tokens and you
have auto_bootstrap=yes (the implicit default) the second node you
attempt to start will refuse to start because you are trying to
bootstrap it into the range of a live node.

> For now, we are planning on using our own relay mechanism to transfer
> data changes from one datacenter to another.

Are you planning to use the streaming commitlog functionality for
this? Not sure how you would capture all changes otherwise, except
having your app just write the same thing to multiple places? Unless
data timestamps are identical between clusters, otherwise identical
data will not merge properly, as cassandra uses data timestamps to
merge.

> Question 2: is this a sane strategy?

On its face my answer is "not... really"? What do you view yourself as
getting with this technique versus using built in replication? As an
example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
consistency level operations?

> Question 3: eventually, we want to turn all these cassandra clusters into one
> large multi-datacenter cluster. What's the best practice to do this? Should I
> just add nodes from all datacenters to the list of seeds and let cassandra
> resolve differences? Is there another way I don't know about?

If you are using NetworkTopologyStrategy and have the same cluster
name for your isolated clusters, all you need to do is :

1) configure NTS to store replicas on a per-datacenter basis
2) ensure that your nodes are in different logical data centers (by
default, all nodes are in DC1/rack1)
3) ensure that clusters are able to reach each other
4) ensure that tokens do not overlap between clusters (the common
technique with manual token assignment is that each node gets a range
which is off-by-one)
5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
6) rolling restart (so the new seed list is picked up)
7) repair ("should" only be required if writes have not replicated via
your out of band mechanism)

Vnodes change the picture slightly because the chance of your clusters
having conflicting tokens increases with the number of token ranges
you have.

=Rob

Reply via email to