Re: replace dead node? " token -1 "

Yang Tue, 14 Aug 2012 23:04:47 -0700

Jim:

thanks a  lot for the info.


when you say "old nodes sometimes hanging around as "unreachable nodes"
when describing cluster", you mean after the new node boots up and assumes
ownership of the same token, you have not manually run nodetool
removeToken, right? this kind of makes sense --- since it seems that the
membership being gossiped around still contains the dead node (which is
represented by a different AWS internal ip), though the same token is being
associated to both dead and new nodes ??? I'm getting a bit confused
here....


I think previously when I boot up a new node with the same token, while the
old host is dead, the other nodes on the
ring says something like "this token xxxxxx is already owned by
old_node_ip_here,...... ".  I don't remember exactly the behavior now,
that's why I'm cautious of using T instead of T-1.


I'm doing more tests to confirm this behavior

Thanks
Yang

On Tue, Aug 14, 2012 at 10:17 PM, Jim Cistaro <[email protected]> wrote:

>  We use priam to replace nodes using replace_token.  We do see some
> issues (currently on 1.0.9, as well as earlier versions) with replace_token.
>
>  Apparently there are some known issues with replace_token.  We have
> experienced the old nodes sometimes hanging around as "unreachable nodes"
> when describing cluster.  Also, we have experienced problems where moving
> the new node causes the old "replaced" node to resurrect for the token that
> was outgoing during the move.
>
>  You can notice these old nodes hanging around in logs.  You will see
> messages like:
> StorageService.java (line 1020) Nodes /<old_ip> and /<new_ip> have the
> same token NNNNNNNNNN.  Ignoring /<old_ip>.
>
>  We have then had to "nt removetoken" to clean things up after moves.  We
> are also investigating using method unsafeAssassinateEndpoint (via jmx) to
> clean up some of the unreachables.
>
>  Like I said, we still use replace_token, but be aware of these possible
> inconveniences.
>
>  Jim Cistaro
> Netflix Cassandra Operations
>
>
>   From: Yang <[email protected]>
> Reply-To: <[email protected]>
> Date: Tue, 14 Aug 2012 21:58:30 -0700
> To: <[email protected]>
> Subject: Re: replace dead node? " token -1 "
>
>  thanks Aaron, it has been a while since i last checked the code,  I'll
> read it to understand it more
>  On Aug 14, 2012 8:48 PM, "aaron morton" <[email protected]> wrote:
>
>>  Using this method, when choosing the new <Token>, should we still use
>> the T-1 ?
>>
>> (AFAIK) No.
>> replace_token is used when you want to replace a node that is dead. In
>> this case the dead node will be identified by its token.
>>
>>  if so, would the duplicate token (same token but different ip) cause
>> problems?
>>
>>  If the nodes are bootstrapping an error is raised.
>> Otherwise the token ownership is passed to the new node.
>>
>>  Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>>  On 15/08/2012, at 11:07 AM, Yang <[email protected]> wrote:
>>
>>  previously when a node dies, I remember the documents describes that
>> it's better to assign T-1 to the new node,
>> where T was the token of the dead node.
>>
>>
>>  the new doc for 1.x here
>>
>>  http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
>>
>>
>>  shows a new way to  pass in cassandra.replace_token=<Token>
>> for the new node.
>> Using this method, when choosing the new <Token>, should we still use the
>> T-1 ?
>>
>>
>>  Also in Priam code:
>>
>> https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java
>>
>>  line 148, it does not seem that Priam does the "-1" thing, but assigns
>> the original token T to the new node.
>> if so, would the duplicate token (same token but different ip) cause
>> problems?
>>
>>
>>  Thanks
>> Yang
>>
>>
>>

Re: replace dead node? " token -1 "

Reply via email to