Hi Paulo,
Thanks a lot for detailed explanation. Our usecase is that, when one node goes 
down, a new node in the same AZ comes up immediately (5 to 10 mins) and it is 
safe to assume that no other nodes in another AZ are down at this point of 
time. So based on your explanation, using 
-Dcassandra.consistent.rangemovement=false seems like the way to go for our 
usecase. I will test it with that option.

Thanks again.

Praveen





From: Paulo Motta <pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, March 30, 2016 at 10:55 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: auto_boorstrap when a node is down

When you add a node it will take over the range of an existing node, and thus 
it should stream data from it to maintain consistency. If the existing node is 
unavailable, the new node may fetch the data from a different replica, which 
may not have some of data from the node which you are taking the range for, 
what may break consistency.

For example, imagine a ring with nodes A, B and C, RF=3. The row X=1 maps to 
node A and is replicated in nodes B and C, so the initial arrangement will be:

A(X=1), B(X=1) and C(X=1)

Node B is down and you write X=2 to A, which replicates the data only to C 
since B is down (and hinted handoff is disabled). The write succeeds at QUORUM. 
The new arragement becomes:

A(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will 
fetch the correct value of X=2)

Now imagine you add a new node D between A and B. If D streams data from A, the 
new replica group will become:

A, D(X=2), B(X=1), C(X=2) (if any of the nodes is down, any read at QUORUM will 
fetch the correct value of X=2)

But if A is down when you bootstrap D and you have 
-Dcassandra.consistent.rangemovement=false, D may stream data from B, so the 
new replica group will be:

A, D(X=1), B(X=1), C(X=2)

Now, if C becomes down, reads at QUORUM will succeed but return the stale value 
of X=1, so consistency is broken.

If you're continuously running repair, have hinted handoff and read repair 
enabled, the probability of something like this happening will decrease, but it 
may still happen. If this is not a problem you may use option 
-Dcassandra.consistent.rangemovement=false to bootstrap a node when another 
node is down. See CASSANDRA-2434 for more background.

2016-03-30 11:14 GMT-03:00 Peddi, Praveen 
<pe...@amazon.com<mailto:pe...@amazon.com>>:
Hello all,
We just upgraded to 2.2.4 (from 2.0.9) and we noticed one issue when new nodes 
are added. When we add a new node when no nodes are down in the cluster, 
everything works fine but when we add new node while 1 node is down, I am 
seeing following error. My understanding was when auto_bootstrap is enabled, 
bootstrapping process uses QUORUM consistency so it should work when one node 
is down. Is that not correct? Is there a way to add a new node with 
bootstrapping, but not using replace address option? We use auto scaling and 
new node gets added automatically when one node goes down and since its all 
scripted I can't use replace address in cassandra-env.sh file as a one-time 
option.

One fallback mechanism we could use is to disable auto bootstrap and let read 
repairs populate the data over time but its not ideal. Is this even a good 
alternative to this failure?

ERROR 20:30:45 Exception encountered during startup
java.lang.RuntimeException: A node required to move the data consistently is 
down (/xx.xx.xx.xx). If you wish to move the data from a potentially 
inconsistent replica, restart the node with 
-Dcassandra.consistent.rangemovement=false

Praveen

Reply via email to