Re: Should replica placement change after a topology change?

Robert Coli Wed, 09 Sep 2015 17:28:07 -0700

On Wed, Sep 9, 2015 at 7:52 AM, Richard Dawe <rich.d...@messagesystems.com>
wrote:


> I am investigating various topology changes, and their effect on replica
> placement. As far as I can tell, replica placement is not changing after
> I’ve changed the topology and run nodetool repair + cleanup. I followed the
> procedure described at
> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_switch_snitch.html
>

That's probably a good thing. I'm going to be modifying the warning in the
cassandra.yaml to advise users that in practice the only change of snitch
or replication strategy one can safely do is one in which replica placement
does not change. It currently says that you need to repair, but there are
plenty of scenarios where you lose all existing replicas for a given datum,
and are therefore unable to repair. The key is that you need at least one
replica to stay the same or repair is worthless. And if you only have one
replica staying the same, you lose any consistency consistency contract you
might have been operating under. One ALMOST NEVER ACTUALLY WANTS TO DO
ANYTHING BUT A NO-OP HERE.

Here is my test scenario : <snip>
>


>
>    1. To determine the token range ownership, I used “nodetool ring
>    <keyspace>” and “nodetool info –T <keyspace>”. I saved the output of those
>    commands with the original topology, after changing the topology, after
>    repairing, after changing the replication strategy, and then again after
>    repairing. In no cases did the tokens change. It looks like nodetool ring
>    and nodetool info –T show the owner but not the replicas for a particular
>    range.
>
> The tokens and ranges shouldn't be changing, the replica placement should
be. AFAIK neither of those commands show you replica placement, they show
you primary range ownership.

Use getendpoints to determine replica placement before and after.


> I was expecting the replica placement to change. Because the racks were
> assigned in groups (rather than alternating), I was expecting the original
> replica placement with SimpleStrategy to be non-optimal after switching to
> NetworkTopologyStrategy. E.g.: if some data was replicated to nodes 1, 2
> and 3, then after the topology change there would be 2 replicas in RAC1, 1
> in RAC2 and none in RAC3. And hence when the repair ran, it would remove
> one replica from RAC1 and make sure that there was a replica in RAC3.
>

I would expect this to be the case.


> However, when I did a query using cqlsh at consistency QUORUM, I saw that
> it was hitting two replicas in the same rack, and a replica in a different
> rack. This suggests that the replica placement did not change after the
> topology change.
>

Perhaps you are seeing the quirks of the current rack-aware implementation,
explicated here?

https://issues.apache.org/jira/browse/CASSANDRA-3810


> Is there some way I can see which nodes have a replica for a given token
> range?
>

Not for a range, but for a given key with nodetool getendpoints.

I wonder if there would be value to the range... in the pre-vnode past I
have merely generated a key for each range. With the number of ranges
increased so dramatically by vnodes, it might be easier to have an endpoint
that works on ranges...

=Rob

Re: Should replica placement change after a topology change?

Reply via email to