Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Stefan Miklosovic Thu, 14 Mar 2019 16:44:38 -0700

Hi Fd,

I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
reported node2 to be DN, then I killed node1 and node3 and I restarted them
and node2 was reported like this:


[root@spark-master-1 /]# nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                     Rack
DN  172.19.0.8  ?          256          64.0%
 bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID
                     Rack
UN  172.19.0.5  382.75 KiB  256          64.4%
 2a062140-2428-4092-b48b-7495d083d7f9  rack1
UN  172.19.0.9  171.41 KiB  256          71.6%
 9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3

Prior to killing of node1 and node3, node2 was indeed marked as DN but it
was part of the "Datacenter: dc1" output where both node1 and node3 were.

But after killing both node1 and node3 (so cluster was totally down), after
restarting them, node2 was reported like that.

I do not know what is the difference here. Are gossiping data somewhere
stored on the disk? I would say so, otherwise there is no way how could
node1 / node3 report
that node2 is down but at the same time I dont get why it is "out of the
list" where node1 and node3 are.


On Fri, 15 Mar 2019 at 02:42, Fd Habash <fmhab...@gmail.com> wrote:

> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>    - If 3 shows as DN
>    - Restart C* on 1 & 2
>    - Nodetool status should NOT show node 3 IP at all.
>
>
>
> Restarting the cluster while a node is down resets gossip state.
>
>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> ----------------
> Thank you
>
>
>
> *From: *Jeff Jirsa <jji...@gmail.com>
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra <user@cassandra.apache.org>
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fmhab...@gmail.com> wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
>         at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256          6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> ----------------
> Thank you
>
>
>
>
>

Stefan Miklosovic

Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

Reply via email to