ok,  I see, the cassandra.replace_token  setting essentially  executes the
manual removeToken step. so the dead node should be removed.

is this the "old node hanging around" issue that you described?
https://issues.apache.org/jira/browse/CASSANDRA-3259
looks this JIRA is fixed in 1.0x already, so it's another issue?

Thanks
Yang

On Tue, Aug 14, 2012 at 11:03 PM, Yang <teddyyyy...@gmail.com> wrote:

> Jim:
>
> thanks a  lot for the info.
>
> when you say "old nodes sometimes hanging around as "unreachable nodes"
> when describing cluster", you mean after the new node boots up and assumes
> ownership of the same token, you have not manually run nodetool
> removeToken, right? this kind of makes sense --- since it seems that the
> membership being gossiped around still contains the dead node (which is
> represented by a different AWS internal ip), though the same token is being
> associated to both dead and new nodes ??? I'm getting a bit confused
> here....
>
>
> I think previously when I boot up a new node with the same token, while
> the old host is dead, the other nodes on the
> ring says something like "this token xxxxxx is already owned by
> old_node_ip_here,...... ".  I don't remember exactly the behavior now,
> that's why I'm cautious of using T instead of T-1.
>
>
> I'm doing more tests to confirm this behavior
>
> Thanks
> Yang
>
>
> On Tue, Aug 14, 2012 at 10:17 PM, Jim Cistaro <jcist...@netflix.com>wrote:
>
>>  We use priam to replace nodes using replace_token.  We do see some
>> issues (currently on 1.0.9, as well as earlier versions) with replace_token.
>>
>>  Apparently there are some known issues with replace_token.  We have
>> experienced the old nodes sometimes hanging around as "unreachable nodes"
>> when describing cluster.  Also, we have experienced problems where moving
>> the new node causes the old "replaced" node to resurrect for the token that
>> was outgoing during the move.
>>
>>  You can notice these old nodes hanging around in logs.  You will see
>> messages like:
>> StorageService.java (line 1020) Nodes /<old_ip> and /<new_ip> have the
>> same token NNNNNNNNNN.  Ignoring /<old_ip>.
>>
>>  We have then had to "nt removetoken" to clean things up after moves.
>>  We are also investigating using method unsafeAssassinateEndpoint (via jmx)
>> to clean up some of the unreachables.
>>
>>  Like I said, we still use replace_token, but be aware of these possible
>> inconveniences.
>>
>>  Jim Cistaro
>> Netflix Cassandra Operations
>>
>>
>>   From: Yang <teddyyyy...@gmail.com>
>> Reply-To: <user@cassandra.apache.org>
>> Date: Tue, 14 Aug 2012 21:58:30 -0700
>> To: <user@cassandra.apache.org>
>> Subject: Re: replace dead node? " token -1 "
>>
>>  thanks Aaron, it has been a while since i last checked the code,  I'll
>> read it to understand it more
>>  On Aug 14, 2012 8:48 PM, "aaron morton" <aa...@thelastpickle.com> wrote:
>>
>>>  Using this method, when choosing the new <Token>, should we still use
>>> the T-1 ?
>>>
>>> (AFAIK) No.
>>> replace_token is used when you want to replace a node that is dead. In
>>> this case the dead node will be identified by its token.
>>>
>>>  if so, would the duplicate token (same token but different ip) cause
>>> problems?
>>>
>>>  If the nodes are bootstrapping an error is raised.
>>> Otherwise the token ownership is passed to the new node.
>>>
>>>  Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>>  On 15/08/2012, at 11:07 AM, Yang <teddyyyy...@gmail.com> wrote:
>>>
>>>  previously when a node dies, I remember the documents describes that
>>> it's better to assign T-1 to the new node,
>>> where T was the token of the dead node.
>>>
>>>
>>>  the new doc for 1.x here
>>>
>>>  http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
>>>
>>>
>>>  shows a new way to  pass in cassandra.replace_token=<Token>
>>> for the new node.
>>> Using this method, when choosing the new <Token>, should we still use
>>> the T-1 ?
>>>
>>>
>>>  Also in Priam code:
>>>
>>> https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java
>>>
>>>  line 148, it does not seem that Priam does the "-1" thing, but assigns
>>> the original token T to the new node.
>>> if so, would the duplicate token (same token but different ip) cause
>>> problems?
>>>
>>>
>>>  Thanks
>>> Yang
>>>
>>>
>>>
>

Reply via email to