That makes sense. Thank you so much for pointing that out Alex.
So long story short. Once I am up to the RF I actually want (RF3 per DC)
and am just adding nodes for capacity joining the Ring will correctly work
and no inconsistencies will exist.
If I just change the RF the nodes don't have the data yet so a repair needs
to be run.

Awesome - thanks so much.

greetings Daniel

On Thu, 3 Aug 2017 at 09:56 Oleksandr Shulgin <oleksandr.shul...@zalando.de>
wrote:

> On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
>> No I set Auto bootstrap to true and the node was UN in nodetool status
>> but when doing a select on the node with ONE I got incomplete data.
>>
>
> What I think is happening here is not related to the new node being added.
>
> When you increase Replication Factor, that does not automatically
> redistribute the existing data.  It just makes other nodes responsible for
> portions of the data they might not really have yet.  So I would expect
> that all your nodes show some inconsistencies, before you run a full repair
> of the ring.
>
> I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
> 3.0.13.
>
> $ ccm status
> Cluster: 'v3013'
> ----------------
> node1: UP
> node3: UP
> node2: UP
>
> $ ccm node1 cqlsh
> cqlsh> create keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 1};
> cqlsh> create table test_rf.t1(id int, data text, primary key(id));
> cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>   1 |  one
>
> (1 rows)
>
> At this point selecting from t1 works correctly on any of the nodes with
> the default CL=ONE.
>
> If we would now increase the RF and try reading again, something
> surprising will happen:
>
> cqlsh> alter keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 2};
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>
> (0 rows)
>
> And in my test this happens on all nodes at the same time.  Explanation is
> fairly simple: now a different node is responsible for the data that was
> written to only one other node previously.
>
> A repair in this tiny test is trivial:
> cqlsh> CONSISTENCY ALL;
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>   1 |  one
>
> (1 rows)
>
> And now the data can be read from any node again, since we did a "full
> repair".
>
> --
> Alex
>
> [1] https://github.com/pcmanus/ccm
>
>

Reply via email to