My company is planning on deploying cassandra to three separate datacenters. Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same.
Question 1: is this enough to guarentee that the three datacenters will have distinct cassandra clusters as well? Or will one node in datacenter A still somehow be able to join datacenter B's ring. Cassandra has cross datacenter replication and we plan to use that in the future. For now, we are planning on using our own relay mechanism to transfer data changes from one datacenter to another. Each cassandra cluster in each datacenter will have the same keyspaces and column families with the same schema. Datacenter A will send mutations over this relay to datacenter B which will replay the mutation in cassandra. Therefore, datacenter A's cassandra cluster will look identical to datacenter B's cassandra cluster, but not through the cross datacenter replication that cassandra offers. Question 2: is this a sane strategy? We're trying to make the smallest possible change when deploying cassandra. Our plan is to slowly move our infrastructure over to relying more on cassandra once we can assess how it behaves with our workload. Question 3: eventually, we want to turn all these cassandra clusters into one large multi-datacenter cluster. What's the best practice to do this? Should I just add nodes from all datacenters to the list of seeds and let cassandra resolve differences? Is there another way I don't know about? Thank you, Faraaz