Reusing the bootstrapping node could have caused this, but hard to tell. Since you have only 7 nodes, have you tried doing a few rolling restarts of all nodes to let gossip settle ? Also, the node is pingable from other nodes even though it says Unreachable below. Correct ?
Based on nodetool status, it appears the node has streamed all the data it needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on that node ? From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com] Sent: Thursday, April 21, 2016 11:51 AM To: user@cassandra.apache.org Subject: Re: Problem Replacing a Dead Node Here is a bit more detail of the whole situation. I am hoping someone can help me out here. We have a seven node cluster. One the nodes started to have issues but it was running. We decided to add a new node, and remove the problematic node after the new node joins. However, the new node did not join the cluster even after three days. Hence, we decided to go with the replacement option. We shutdown the problematic node. After that, we stopped cassandra on the bootstraping node, deleted all the data, and restarted that node as the replacement node for the problematic node. Since, we reused the bootstrapping node as the replacement node, I am wondering whether that is causing any issue. Any insights are appreciated. This is the output of nodetool describecluster from the replacement node, and two other nodes. mhossain@cassandra-24:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] mhossain@cassandra-13:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] UNREACHABLE: [10.0.7.91, 10.0.7.4] mhossain@cassandra-09:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] UNREACHABLE: [10.0.7.91, 10.0.7.4] cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip address of the dead node. -Mir On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain <mir.tanvir.hoss...@gmail.com<mailto:mir.tanvir.hoss...@gmail.com>> wrote: Hi, I am trying to replace a dead node with by following https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fdocs.datastax.com%2fen%2fcassandra%2f2.0%2fcassandra%2foperations%2fops_replace_node_t.html&data=01%7c01%7cAnubhav.Kale%40microsoft.com%7c40641d35c89d47225a3208d36a15ecff%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Tpe1lrALsVKKwgZG1THMvDyZJlN6ps596CtkKyOguUk%3d>. It's been 3 full days since the replacement node started, and the node is still not showing up as part of the cluster on OpsCenter. I was wondering whether the delay is due to the fact that I have a test keyspace with replication factor of one? If I delete that keyspace, would the new node successfully replace the dead node? Any general insight will be hugely appreciated. Thanks, Mir