Hi, We're looking into adding a second datacenter to our cluster via a rebuild, and we're curious on how Cassandra determines which source replica to rebuild from in the source datacenter. For a bit more context, we're using the Ec2Snitch with dynamic snitch enabled, and are using NetworkTopologyStrategy for all of our keyspaces with RF = 3.
Looking at the source code, it appears that it's determined by the closest replica in the source datacenter via the snitch (https://github.com/apache/cassandra/blob/cassandra-3.11.11/src/java/org/apache/cassandra/dht/RangeStreamer.java#L226), which I think is generally fine. Is this correct, or am I mis-reading the code? If so, there appears to be an edge case surrounding consistency which I would like to clarify: Assuming identical topologies, there is no strict guarantee that each source replica is streamed over to the destination datacenter. This is because we're using the snitch to determine proximity, which could have removed a node from its own list for being down, or dynamic snitch itself could've weighed it with a higher score. As a result, when rebuilding each node in their respective racks, it is totally possible for all racks to receive the same data from the same source replica. Which, of course, may not be fully consistent? Cheers, Sam