ok, I see, the cassandra.replace_token setting essentially executes the manual removeToken step. so the dead node should be removed.
is this the "old node hanging around" issue that you described? https://issues.apache.org/jira/browse/CASSANDRA-3259 looks this JIRA is fixed in 1.0x already, so it's another issue? Thanks Yang On Tue, Aug 14, 2012 at 11:03 PM, Yang <teddyyyy...@gmail.com> wrote: > Jim: > > thanks a lot for the info. > > when you say "old nodes sometimes hanging around as "unreachable nodes" > when describing cluster", you mean after the new node boots up and assumes > ownership of the same token, you have not manually run nodetool > removeToken, right? this kind of makes sense --- since it seems that the > membership being gossiped around still contains the dead node (which is > represented by a different AWS internal ip), though the same token is being > associated to both dead and new nodes ??? I'm getting a bit confused > here.... > > > I think previously when I boot up a new node with the same token, while > the old host is dead, the other nodes on the > ring says something like "this token xxxxxx is already owned by > old_node_ip_here,...... ". I don't remember exactly the behavior now, > that's why I'm cautious of using T instead of T-1. > > > I'm doing more tests to confirm this behavior > > Thanks > Yang > > > On Tue, Aug 14, 2012 at 10:17 PM, Jim Cistaro <jcist...@netflix.com>wrote: > >> We use priam to replace nodes using replace_token. We do see some >> issues (currently on 1.0.9, as well as earlier versions) with replace_token. >> >> Apparently there are some known issues with replace_token. We have >> experienced the old nodes sometimes hanging around as "unreachable nodes" >> when describing cluster. Also, we have experienced problems where moving >> the new node causes the old "replaced" node to resurrect for the token that >> was outgoing during the move. >> >> You can notice these old nodes hanging around in logs. You will see >> messages like: >> StorageService.java (line 1020) Nodes /<old_ip> and /<new_ip> have the >> same token NNNNNNNNNN. Ignoring /<old_ip>. >> >> We have then had to "nt removetoken" to clean things up after moves. >> We are also investigating using method unsafeAssassinateEndpoint (via jmx) >> to clean up some of the unreachables. >> >> Like I said, we still use replace_token, but be aware of these possible >> inconveniences. >> >> Jim Cistaro >> Netflix Cassandra Operations >> >> >> From: Yang <teddyyyy...@gmail.com> >> Reply-To: <user@cassandra.apache.org> >> Date: Tue, 14 Aug 2012 21:58:30 -0700 >> To: <user@cassandra.apache.org> >> Subject: Re: replace dead node? " token -1 " >> >> thanks Aaron, it has been a while since i last checked the code, I'll >> read it to understand it more >> On Aug 14, 2012 8:48 PM, "aaron morton" <aa...@thelastpickle.com> wrote: >> >>> Using this method, when choosing the new <Token>, should we still use >>> the T-1 ? >>> >>> (AFAIK) No. >>> replace_token is used when you want to replace a node that is dead. In >>> this case the dead node will be identified by its token. >>> >>> if so, would the duplicate token (same token but different ip) cause >>> problems? >>> >>> If the nodes are bootstrapping an error is raised. >>> Otherwise the token ownership is passed to the new node. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 15/08/2012, at 11:07 AM, Yang <teddyyyy...@gmail.com> wrote: >>> >>> previously when a node dies, I remember the documents describes that >>> it's better to assign T-1 to the new node, >>> where T was the token of the dead node. >>> >>> >>> the new doc for 1.x here >>> >>> http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node >>> >>> >>> shows a new way to pass in cassandra.replace_token=<Token> >>> for the new node. >>> Using this method, when choosing the new <Token>, should we still use >>> the T-1 ? >>> >>> >>> Also in Priam code: >>> >>> https://github.com/Netflix/Priam/blob/master/priam/src/main/java/com/netflix/priam/identity/InstanceIdentity.java >>> >>> line 148, it does not seem that Priam does the "-1" thing, but assigns >>> the original token T to the new node. >>> if so, would the duplicate token (same token but different ip) cause >>> problems? >>> >>> >>> Thanks >>> Yang >>> >>> >>> >