considering there is this minor "old node hanging around" issue, would the
old T-1 approach sound more attractive?
that way you don't necessarily have to remove the dead token immediately,
but could come back the next day, or even a week  later. T-1 would behave
essentially the same in terms of partitioning the data range.


Thanks
Yang

On Wed, Aug 15, 2012 at 1:39 AM, Yang <teddyyyy...@gmail.com> wrote:

> ok,  I see, the cassandra.replace_token  setting essentially  executes the
> manual removeToken step. so the dead node should be removed.
>
> is this the "old node hanging around" issue that you described?
> https://issues.apache.org/jira/browse/CASSANDRA-3259
> looks this JIRA is fixed in 1.0x already, so it's another issue?
>
> Thanks
> Yang
>
> On Tue, Aug 14, 2012 at 11:03 PM, Yang <teddyyyy...@gmail.com> wrote:
>
>> Jim:
>>
>> thanks a  lot for the info.
>>
>> when you say "old nodes sometimes hanging around as "unreachable nodes"
>> when describing cluster", you mean after the new node boots up and assumes
>> ownership of the same token, you have not manually run nodetool
>> removeToken, right? this kind of makes sense --- since it seems that the
>> membership being gossiped around still contains the dead node (which is
>> represented by a different AWS internal ip), though the same token is being
>> associated to both dead and new nodes ??? I'm getting a bit confused
>> here....
>>
>>
>> I think previously when I boot up a new node with the same token, while
>> the old host is dead, the other nodes on the
>> ring says something like "this token xxxxxx is already owned by
>> old_node_ip_here,...... ".  I don't remember exactly the behavior now,
>> that's why I'm cautious of using T instead of T-1.
>>
>>
>> I'm doing more tests to confirm this behavior
>>
>> Thanks
>> Yang
>>
>>
>> On Tue, Aug 14, 2012 at 10:17 PM, Jim Cistaro <jcist...@netflix.com>wrote:
>>
>>>  We use priam to replace nodes using replace_token.  We do see some
>>> issues (currently on 1.0.9, as well as earlier versions) with replace_token.
>>>
>>>  Apparently there are some known issues with replace_token.  We have
>>> experienced the old nodes sometimes hanging around as "unreachable nodes"
>>> when describing cluster.  Also, we have experienced problems where moving
>>> the new node causes the old "replaced" node to resurrect for the token that
>>> was outgoing during the move.
>>>
>>>  You can notice these old nodes hanging around in logs.  You will see
>>> messages like:
>>> StorageService.java (line 1020) Nodes /<old_ip> and /<new_ip> have the
>>> same token NNNNNNNNNN.  Ignoring /<old_ip>.
>>>
>>>  We have then had to "nt removetoken" to clean things up after moves.
>>>  We are also investigating using method unsafeAssassinateEndpoint (via jmx)
>>> to clean up some of the unreachables.
>>>
>>>  Like I said, we still use replace_token, but be aware of these
>>> possible inconveniences.
>>>
>>>  Jim Cistaro
>>> Netflix Cassandra Operations
>>>
>>>
>>>   From: Yang <teddyyyy...@gmail.com>
>>> Reply-To: <user@cassandra.apache.org>
>>> Date: Tue, 14 Aug 2012 21:58:30 -0700
>>> To: <user@cassandra.apache.org>
>>> Subject: Re: replace dead node? " token -1 "
>>>
>>>  thanks Aaron, it has been a while since i last checked the code,  I'll
>>> read it to understand it more
>>>  On Aug 14, 2012 8:48 PM, "aaron morton" <aa...@thelastpickle.com>
>>> wrote:
>>>
>>>>  Using this method, when choosing the new <Token>, should we still use
>>>> the T-1 ?
>>>>
>>>> (AFAIK) No.
>>>> replace_token is used when you want to replace a node that is dead. In
>>>> this case the dead node will be identified by its token.
>>>>
>>>>  if so, would the duplicate token (same token but different ip) cause
>>>> problems?
>>>>
>>>>  If the nodes are bootstrapping an error is raised.
>>>> Otherwise the token ownership is passed to the new node.
>>>>
>>>>  Cheers
>>>>
>>>>    -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>>  On 15/08/2012, at 11:07 AM, Yang <teddyyyy...@gmail.com> wrote:
>>>>
>>>>  previously when a node dies, I remember the documents describes that
>>>> it's better to assign T-1 to the new node,
>>>> where T was the token of the dead node.
>>>>
>>>>
>>>>  the new doc for 1.x here
>>>>
>>>>  http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
>>>>
>>>>
>>>>  shows a new way to  pass in cassandra.replace_token=<Token>
>>>> for the new node.
>>>> Using this method, when choosing the new <Token>, should we still use
>>>> the T-1 ?
>>>>
>>>>
>>>>  Also in Priam code:
>>>>
>>>> https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java
>>>>
>>>>  line 148, it does not seem that Priam does the "-1" thing, but
>>>> assigns the original token T to the new node.
>>>> if so, would the duplicate token (same token but different ip) cause
>>>> problems?
>>>>
>>>>
>>>>  Thanks
>>>> Yang
>>>>
>>>>
>>>>
>>
>

Reply via email to