I have not viewed the code, but it would seem that replace_token does not "remove token", because that would spread the data and then "unspread" it when the new node joins. But like I said, I have not read the code.
>From our standpoint, we want the tokens to stay the same when possible due to >the way our backups are tagged. As for "old nodes staying around", you are correct, we never remove token (because we replace node for that same token) and the gossip-ing keeps knowledge of that old node. Sorry if this explanation is not that clear. This issue is a little unclear and we are dealing wth it from an ops POV rather than a dev understanding of the code. As for the attractiveness of the T-1 approach. If you don't have the need for token consistency, then it might be more attractive for you. We don't use it, so I cannot say if that approach has any issues, etc. Jim From: Yang <teddyyyy...@gmail.com<mailto:teddyyyy...@gmail.com>> Reply-To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wed, 15 Aug 2012 02:00:55 -0700 To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: replace dead node? " token -1 " considering there is this minor "old node hanging around" issue, would the old T-1 approach sound more attractive? that way you don't necessarily have to remove the dead token immediately, but could come back the next day, or even a week later. T-1 would behave essentially the same in terms of partitioning the data range. Thanks Yang On Wed, Aug 15, 2012 at 1:39 AM, Yang <teddyyyy...@gmail.com<mailto:teddyyyy...@gmail.com>> wrote: ok, I see, the cassandra.replace_token setting essentially executes the manual removeToken step. so the dead node should be removed. is this the "old node hanging around" issue that you described? https://issues.apache.org/jira/browse/CASSANDRA-3259 looks this JIRA is fixed in 1.0x already, so it's another issue? Thanks Yang On Tue, Aug 14, 2012 at 11:03 PM, Yang <teddyyyy...@gmail.com<mailto:teddyyyy...@gmail.com>> wrote: Jim: thanks a lot for the info. when you say "old nodes sometimes hanging around as "unreachable nodes" when describing cluster", you mean after the new node boots up and assumes ownership of the same token, you have not manually run nodetool removeToken, right? this kind of makes sense --- since it seems that the membership being gossiped around still contains the dead node (which is represented by a different AWS internal ip), though the same token is being associated to both dead and new nodes ??? I'm getting a bit confused here.... I think previously when I boot up a new node with the same token, while the old host is dead, the other nodes on the ring says something like "this token xxxxxx is already owned by old_node_ip_here,...... ". I don't remember exactly the behavior now, that's why I'm cautious of using T instead of T-1. I'm doing more tests to confirm this behavior Thanks Yang On Tue, Aug 14, 2012 at 10:17 PM, Jim Cistaro <jcist...@netflix.com<mailto:jcist...@netflix.com>> wrote: We use priam to replace nodes using replace_token. We do see some issues (currently on 1.0.9, as well as earlier versions) with replace_token. Apparently there are some known issues with replace_token. We have experienced the old nodes sometimes hanging around as "unreachable nodes" when describing cluster. Also, we have experienced problems where moving the new node causes the old "replaced" node to resurrect for the token that was outgoing during the move. You can notice these old nodes hanging around in logs. You will see messages like: StorageService.java (line 1020) Nodes /<old_ip> and /<new_ip> have the same token NNNNNNNNNN. Ignoring /<old_ip>. We have then had to "nt removetoken" to clean things up after moves. We are also investigating using method unsafeAssassinateEndpoint (via jmx) to clean up some of the unreachables. Like I said, we still use replace_token, but be aware of these possible inconveniences. Jim Cistaro Netflix Cassandra Operations From: Yang <teddyyyy...@gmail.com<mailto:teddyyyy...@gmail.com>> Reply-To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tue, 14 Aug 2012 21:58:30 -0700 To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: replace dead node? " token -1 " thanks Aaron, it has been a while since i last checked the code, I'll read it to understand it more On Aug 14, 2012 8:48 PM, "aaron morton" <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>> wrote: Using this method, when choosing the new <Token>, should we still use the T-1 ? (AFAIK) No. replace_token is used when you want to replace a node that is dead. In this case the dead node will be identified by its token. if so, would the duplicate token (same token but different ip) cause problems? If the nodes are bootstrapping an error is raised. Otherwise the token ownership is passed to the new node. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/08/2012, at 11:07 AM, Yang <teddyyyy...@gmail.com<mailto:teddyyyy...@gmail.com>> wrote: previously when a node dies, I remember the documents describes that it's better to assign T-1 to the new node, where T was the token of the dead node. the new doc for 1.x here http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node shows a new way to pass in cassandra.replace_token=<Token> for the new node. Using this method, when choosing the new <Token>, should we still use the T-1 ? Also in Priam code: https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java line 148, it does not seem that Priam does the "-1" thing, but assigns the original token T to the new node. if so, would the duplicate token (same token but different ip) cause problems? Thanks Yang