Hello All, We are on 1.2.18 (running on Ubuntu 12.04) and we recently tried to add a second DC on our demo environment, just before trying it on live. The existing DC1 has two nodes which approximately hold 10G of data (RF=2). In order to add the second DC, DC2, we followed this procedure:
On DC1 nodes: 1. Changed the Snitch in the cassandra.yaml from default to GossipingPropertyFileSnitch. 2. Configured the cassandra-rackdc.properties (DC1, RAC1). 3. Rolling restart 4. Update replication strategy for each keyspace, for example: ALTER KEYSPACE <keyspace> WITH REPLICATION = {'class':'NetworkTopologyStrategy','DC1':2}; On DC2 nodes: 5. Edit the cassandra.yaml with: auto_bootstrap: false, seeds (one IP from DC1), cluster name to match whatever we have on DC1 nodes, correct IP settings, num_tokens, initial_token left unset and finally the snitch (GossipingPropertyFileSnitch, as in DC1). 6. Changed the cassandra-rackdc.properties (DC2, RAC1) On the Application: 7. Changed the C# DataStax driver load balancing policy to be DCAwareRoundRobinPolicy 8. Changed the application consistency level from QUORUM to LOCAL_QUORUM 9. After deleting the data, commitlog and saved_caches directory we started cassandra both nodes in the new DC, DC2. According to the logs at this point all nodes were able to see all other nodes with the correct/expected output when running nodetool status. On DC1 nodes: 10. After cassandra was running on DC2, we changed the Keyspace RF to include the new DC as follows: ALTER KEYSPACE <keyspace> WITH REPLICATION = {'class':'NetworkTopologyStrategy','DC1':2, 'DC2':2}; 11. As a last step and in order to stream the data across to the second DC, we run this on node1 of DC2: nodetool rebuild DC1. After the successful completion of this, we were planning to run the same on node2 of DC2. The problem is that the nodetool seems to be stuck, and nodetool netstats on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2 at DC1. This doesn't tally with nodetool netstats when running it against either of the DC1 nodes. The DC1 nodes don't think they stream anything to DC2. It is worth pointing that initially we tried to run 'nodetool rebuild DC1' on both nodes at DC2, given the small amount of data to be streamed in total (approximately 10G as I explained above). We exoerienced the same problem, with the only difference being that 'nodetool rebuild DC1' stuck on both nodes at DC2 very soon after running it, whereas now it happened only after running it for an hour or so. We thought the problem was that we tried to run nodetool against both nodes at the same time. So, we tried running it only against node 1 after we deleted all the data, commitlog and caches on both nodes and started from step (9) again. Now nodetool rebuild is running against node1 at DC2 for more than 12 hours with no luck... The weird thing is that the cassandra logs appear to be clean and the VPN between the two DCs has no problems at all. Any thoughts? Have we missed something in the steps I described? Is anything wrong in the procedure? Any help would be much appreciated. Thanks, Vasilis