Some further info: I'm not using Vnodes, so I'm using the 1.1 replace node trick of setting the initial_token in the cassandra.yaml file to the value of the dead node's token -1, and autobootstrap=true. However, according to the Apache wiki ( https://wiki.apache.org/cassandra/Operations#For_versions_1.2.0_and_above), on 1.2 you should actually remove the dead node from the ring, before adding a replacement node.
Does that mean the trick of setting the initial token to the value of the dead node's -1 (described in http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node) is not valid anymore in 1.2 without vnodes? On Wed, Mar 12, 2014 at 5:57 PM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > Hello, > > I'm trying to replace a dead node using the procedure in [1], but the > replacement node initially sees the dead node as UP, and after a few > minutes the node is marked as DOWN again, failing the streaming/bootstrap > procedure of the replacement node. This dead node is always seen as DOWN by > the rest of the cluster. > > Could this be a bug? I can easily reproduce it in our production > environment, but don't know if it's reproducible in a clean environment. > > Version: 1.2.13 > > Here is the log from the replacement node (192.168.1.10 is the dead node): > > INFO [GossipStage:1] 2014-03-12 20:25:41,089 Gossiper.java (line 843) > Node /192.168.1.10 is now part of the cluster > INFO [GossipStage:1] 2014-03-12 20:25:41,090 Gossiper.java (line 809) > InetAddress /192.168.1.10 is now UP > INFO [GossipTasks:1] 2014-03-12 20:34:54,238 Gossiper.java (line 823) > InetAddress /192.168.1.10 is now DOWN > ERROR [GossipTasks:1] 2014-03-12 20:34:54,240 AbstractStreamSession.java > (line 110) Stream failed because /192.168.1.10 died or was > restarted/removed (streams may still be active in background, but further > streams won't be started) > WARN [GossipTasks:1] 2014-03-12 20:34:54,240 RangeStreamer.java (line > 246) Streaming from /192.168.1.10 failed > ERROR [GossipTasks:1] 2014-03-12 20:34:54,240 AbstractStreamSession.java > (line 110) Stream failed because /192.168.1.10 died or was > restarted/removed (streams may still be active in background, but further > streams won't be started) > WARN [GossipTasks:1] 2014-03-12 20:34:54,241 RangeStreamer.java (line > 246) Streaming from /192.168.1.10 failed > > [1] > http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node > > Cheers, > > Paulo > > -- > *Paulo Motta* > > Chaordic | *Platform* > *www.chaordic.com.br <http://www.chaordic.com.br/>* > +55 48 3232.3200 > +55 83 9690-1314 > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200 +55 83 9690-1314