I think you are correct. 

When the new node starts it randomly selects tokens, which result in a random 
set of token ranges being transferred from other nodes. 

For each pending range the existing token ranges in the cluster are searched to 
find one that contains the range we want to transfer. A list of all replicas 
for this (exiting) range is created and sorted by proximity. The first endpoint 
in the list will be used. 
   
The bit I am unsure of is if it's possible for the replicas of row to move from 
A, B. C to D, E, F. 

Anyone else help out ? 

Cheers


-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/07/2013, at 9:21 AM, sankalp kohli <kohlisank...@gmail.com> wrote:

> Interesting... I guess you have to add one node at a time and run repair on 
> it. 
> 
> 
> On Sat, Jul 20, 2013 at 7:30 AM, E S <tr1skl...@yahoo.com> wrote:
> I am trying to understand the best procedure for adding new nodes.  The one 
> that I see most often online seems to have a hole where there is a low 
> probability of permanently losing data.  I want to understand what I am 
> missing in my understanding.
> 
> Let's say I have a 3 node cluster (node A,B,C) with a RF of 3.  I want to 
> double the cluster size to 6 (node A,B,C,D,E,F) while keeping the replication 
> factor of 3.  Let's assume we use vnodes.
> 
> My understanding is to bootstrap the 3 nodes and then run repair then 
> cleanup.  Here is my failure case:
> 
> Before bootstrapping I have a row that is only replicated onto node A and B.  
> Assume I did a quorum write and there was some hiccup on C, hinted handoff 
> didn't work, and a repair has not yet been run.  Let's also assume that once 
> nodes D,E, F have been bootstrapped, this rows new replicas are D,E, and F.
> 
> My reading through the bootstrapping code shows that for a given range, it 
> streams it only from one node (unlike repair).  There is a 1/9 chance that 
> D,E,F will have streamed the range containing the row from C, which does not 
> have this row.
> 
> Now not even a consistency level read of ALL will return the row.  A repair 
> will not solve it, and when cleanup is run, the row is permanently deleted.
> 
> I don't think this problem would normally happen without vnodes, because when 
> doubling you would alternate the new nodes with the old nodes in the ring, so 
> while quorum might not work until the final repair, "all" would, and a repair 
> would solve the problem.  With vnodes though, some of the ranges will follow 
> the pattern above (range ownership moving from A,B,C to D,E,F).
> 
> Am I missing something here?  If I'm right, I think the only way to avoid 
> this is adding less then a quorum of new nodes (in this case 1) before doing 
> a repair.  That would be painful since repairs take a while.
> 
> Thanks for any help.
> 
> Eddie
> 
> 
> 

Reply via email to