That makes sense. Thank you so much for pointing that out Alex. So long story short. Once I am up to the RF I actually want (RF3 per DC) and am just adding nodes for capacity joining the Ring will correctly work and no inconsistencies will exist. If I just change the RF the nodes don't have the data yet so a repair needs to be run.
Awesome - thanks so much. greetings Daniel On Thu, 3 Aug 2017 at 09:56 Oleksandr Shulgin <oleksandr.shul...@zalando.de> wrote: > On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko < > daniel.hoelbling-in...@bitmovin.com> wrote: > >> No I set Auto bootstrap to true and the node was UN in nodetool status >> but when doing a select on the node with ONE I got incomplete data. >> > > What I think is happening here is not related to the new node being added. > > When you increase Replication Factor, that does not automatically > redistribute the existing data. It just makes other nodes responsible for > portions of the data they might not really have yet. So I would expect > that all your nodes show some inconsistencies, before you run a full repair > of the ring. > > I can fairly easily reproduce it locally with ccm[1], 3 nodes, version > 3.0.13. > > $ ccm status > Cluster: 'v3013' > ---------------- > node1: UP > node3: UP > node2: UP > > $ ccm node1 cqlsh > cqlsh> create keyspace test_rf WITH replication = {'class': > 'NetworkTopologyStrategy', 'datacenter1': 1}; > cqlsh> create table test_rf.t1(id int, data text, primary key(id)); > cqlsh> insert into test_rf.t1(id, data) values(1, 'one'); > cqlsh> select * from test_rf.t1; > > id | data > ----+------ > 1 | one > > (1 rows) > > At this point selecting from t1 works correctly on any of the nodes with > the default CL=ONE. > > If we would now increase the RF and try reading again, something > surprising will happen: > > cqlsh> alter keyspace test_rf WITH replication = {'class': > 'NetworkTopologyStrategy', 'datacenter1': 2}; > cqlsh> select * from test_rf.t1; > > id | data > ----+------ > > (0 rows) > > And in my test this happens on all nodes at the same time. Explanation is > fairly simple: now a different node is responsible for the data that was > written to only one other node previously. > > A repair in this tiny test is trivial: > cqlsh> CONSISTENCY ALL; > cqlsh> select * from test_rf.t1; > > id | data > ----+------ > 1 | one > > (1 rows) > > And now the data can be read from any node again, since we did a "full > repair". > > -- > Alex > > [1] https://github.com/pcmanus/ccm > >