Hi,

We're looking into adding a second datacenter to our cluster via a
rebuild, and we're curious on how Cassandra determines which source
replica to rebuild from in the source datacenter. For a bit more
context, we're using the Ec2Snitch with dynamic snitch enabled, and
are using NetworkTopologyStrategy for all of our keyspaces with RF =
3.

Looking at the source code, it appears that it's determined by the
closest replica in the source datacenter via the snitch
(https://github.com/apache/cassandra/blob/cassandra-3.11.11/src/java/org/apache/cassandra/dht/RangeStreamer.java#L226),
which I think is generally fine. Is this correct, or am I mis-reading
the code?

If so, there appears to be an edge case surrounding consistency which
I would like to clarify:

Assuming identical topologies, there is no strict guarantee that each
source replica is streamed over to the destination datacenter. This is
because we're using the snitch to determine proximity, which could
have removed a node from its own list for being down, or dynamic
snitch itself could've weighed it with a higher score.

As a result, when rebuilding each node in their respective racks, it
is totally possible for all racks to receive the same data from the
same source replica. Which, of course, may not be fully consistent?

Cheers,
Sam

Reply via email to