Thanks for the help, this seems to have worked. Except that while adding the
new node we added the same token to a different IP (operational script
goofup) and brought the node up, so now the other nodes just had the message
that a new IP had taken over the token.
- So we brought it down and fixed it and it all came up fine.
- ran removetoken did not finish
- so ran removetoken force, that seemed to work
- Cleaned up the nodes
- Everything from the ring perspective appeared ok on all nodes
- except for this error message (which based on some thread it seemed
would go away) reported in this thread =>
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-4-Replication-assertion-error-after-removetoken-removetoken-force-and-a-restart-td6311082.html
- So I restarted this one node that was complaining (this was not the
node that was replaced)
- But once this node was restarted, the ring command on it showed the old
single token IP (the one we removed).
- So I am running the removetoken again , been running for about 2-3
hours now.....
the ring shows
113427455640312821154458202477256070485
10.xxx.0.184 Up Normal 829.73 GB 33.33%
0
10.xxx.0.185 Up Normal 576.09 GB 33.33%
56713727820156410577229101238628035241
10.xxx.0.189 Down Leaving 139.73 KB 0.00%
56713727820156410577229101238628035242
10.xxx.0.188 Up Normal 697.41 GB 33.33%
113427455640312821154458202477256070485
What are my choices here, how do I clean up the ring? The other 2 nodes show
the ring fine (not even aware of 189)
Thanks
Anand
On Fri, Aug 19, 2011 at 11:53 AM, Anand Somani <[email protected]> wrote:
> ok I will go with the IP change strategy and keep you posted. Not going to
> manually copy any data, just bring up the node and let it bootstrap.
>
> Thanks
>
>
> On Fri, Aug 19, 2011 at 11:46 AM, Peter Schuller <
> [email protected]> wrote:
>
>> > (Yes, this should definitely be easier. Maybe the most generally
>> > useful fix would be for Cassandra to support a node joining the wring
>> > in "write-only" mode. This would be useful in other cases, such as
>> > when you're trying to temporarily off-load a node by dissabling
>> > gossip).
>>
>> I knew I had read discussions before:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2568
>>
>> --
>> / Peter Schuller (@scode on twitter)
>>
>
>