Reusing the bootstrapping node could have caused this, but hard to tell. Since 
you have only 7 nodes, have you tried doing a few rolling restarts of all nodes 
to let gossip settle ? Also, the node is pingable from other nodes even though 
it says Unreachable below. Correct ?

Based on nodetool status, it appears the node has streamed all the data it 
needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on that 
node ?

From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
Sent: Thursday, April 21, 2016 11:51 AM
To: user@cassandra.apache.org
Subject: Re: Problem Replacing a Dead Node

Here is a bit more detail of the whole situation. I am hoping someone can help 
me out here.

We have a seven node cluster. One the nodes started to have issues but it was 
running. We decided to add a new node, and remove the problematic node after 
the new node joins. However, the new node did not join the cluster even after 
three days. Hence, we decided to go with the replacement option. We shutdown 
the problematic node. After that, we stopped cassandra on the bootstraping 
node, deleted all the data, and restarted that node as the replacement node for 
the problematic node.

Since, we reused the bootstrapping node as the replacement node, I am wondering 
whether that is causing any issue. Any insights are appreciated.

This is the output of nodetool describecluster from the replacement node, and 
two other nodes.

mhossain@cassandra-24:~$ nodetool describecluster
Cluster Information:
            Name: App
            Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
            Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
            Schema versions:
                        80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]


mhossain@cassandra-13:~$ nodetool describecluster
Cluster Information:
            Name: App
            Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
            Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
            Schema versions:
                        80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]

                        UNREACHABLE: [10.0.7.91, 10.0.7.4]


mhossain@cassandra-09:~$ nodetool describecluster
Cluster Information:
            Name: App
            Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
            Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
            Schema versions:
                        80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]

                        UNREACHABLE: [10.0.7.91, 10.0.7.4]


cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip address of 
the dead node.

-Mir

On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain 
<mir.tanvir.hoss...@gmail.com<mailto:mir.tanvir.hoss...@gmail.com>> wrote:
Hi, I am trying to replace a dead node with by following 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.datastax.com%2fen%2fcassandra%2f2.0%2fcassandra%2foperations%2fops_replace_node_t.html&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c40641d35c89d47225a3208d36a15ecff%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Tpe1lrALsVKKwgZG1THMvDyZJlN6ps596CtkKyOguUk%3d>.
 It's been 3 full days since the replacement node started, and the node is 
still not showing up as part of the cluster on OpsCenter. I was wondering 
whether the delay is due to the fact that I have a test keyspace with 
replication factor of one? If I delete that keyspace, would the new node 
successfully replace the dead node? Any general insight will be hugely 
appreciated.

Thanks,
Mir



Reply via email to