My company is planning on deploying cassandra to three separate datacenters.
Each datacenter will have a cassandra cluster with a separate set of seeds
specific to that datacenter. However, the cluster name will be the same.

Question 1: is this enough to guarentee that the three datacenters will have
distinct cassandra clusters as well? Or will one node in datacenter A still
somehow be able to join datacenter B's ring.

Cassandra has cross datacenter replication and we plan to use that in the
future. For now, we are planning on using our own relay mechanism to transfer
data changes from one datacenter to another. Each cassandra cluster in each
datacenter will have the same keyspaces and column families with the same
schema. Datacenter A will send mutations over this relay to datacenter B which
will replay the mutation in cassandra.  Therefore, datacenter A's cassandra
cluster will look identical to datacenter B's cassandra cluster, but not through
the cross datacenter replication that cassandra offers.

Question 2: is this a sane strategy? We're trying to make the smallest possible
change when deploying cassandra. Our plan is to slowly move our infrastructure
over to relying more on cassandra once we can assess how it behaves with our
workload.

Question 3: eventually, we want to turn all these cassandra clusters into one
large multi-datacenter cluster. What's the best practice to do this? Should I
just add nodes from all datacenters to the list of seeds and let cassandra
resolve differences? Is there another way I don't know about?

Thank you,
Faraaz

Reply via email to