Do you have a cassandra-topology.properties file in place? If so, GPFS will 
instantiate a PropertyFileSnitch using that for compatibility mode. Then, when 
gossip state doesn’t contain any endpoint info about the down node (because you 
bounced the whole cluster), instead of reading the rack & dc from system.peers, 
it will fall back to the PFS. DC1:r1 is the default in the 
cassandra-topologies.properties in the distro.

> On 15 Mar 2019, at 12:04, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> Is this using GPFS?  If so, can you open a JIRA? It feels like potentially 
> GPFS is not persisting the rack/DC info into system.peers and loses the DC on 
> restart. This is somewhat understandable, but definitely deserves a JIRA. 
> 
> On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic 
> <stefan.mikloso...@instaclustr.com 
> <mailto:stefan.mikloso...@instaclustr.com>> wrote:
> Hi Fd,
> 
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3 
> reported node2 to be DN, then I killed node1 and node3 and I restarted them 
> and node2 was reported like this:
> 
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID             
>                   Rack
> DN  172.19.0.8  ?          256          64.0%             
> bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns (effective)  Host ID             
>                   Rack
> UN  172.19.0.5  382.75 KiB  256          64.4%             
> 2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256          71.6%             
> 9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
> 
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it was 
> part of the "Datacenter: dc1" output where both node1 and node3 were.
> 
> But after killing both node1 and node3 (so cluster was totally down), after 
> restarting them, node2 was reported like that.
> 
> I do not know what is the difference here. Are gossiping data somewhere 
> stored on the disk? I would say so, otherwise there is no way how could node1 
> / node3 report 
> that node2 is down but at the same time I dont get why it is "out of the 
> list" where node1 and node3 are.
> 
> 
> On Fri, 15 Mar 2019 at 02:42, Fd Habash <fmhab...@gmail.com 
> <mailto:fmhab...@gmail.com>> wrote:
> I can conclusively say, none of these commands were run. However, I think 
> this is  the likely scenario …
> 
>  
> 
> If you have a cluster of three nodes 1,2,3 …
> 
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>  
> 
> Restarting the cluster while a node is down resets gossip state.
> 
>  
> 
> There is a good chance this is what happened.
> 
>  
> 
> Plausible?
> 
>  
> 
> ----------------
> Thank you
> 
>  
> 
> From: Jeff Jirsa <mailto:jji...@gmail.com>
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra <mailto:user@cassandra.apache.org>
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist 
> ingossip
> 
>  
> 
> Two things that wouldn't be a bug:
> 
>  
> 
> You could have run removenode
> 
> You could have run assassinate
> 
>  
> 
> Also could be some new bug, but that's much less likely. 
> 
>  
> 
>  
> 
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash <fmhab...@gmail.com 
> <mailto:fmhab...@gmail.com>> wrote:
> 
> I have a node which I know for certain was a cluster member last week. It 
> showed in nodetool status as DN. When I attempted to replace it today, I got 
> this message
> 
>  
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception 
> encountered during startup
> 
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because 
> it doesn't exist in gossip
> 
>         at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>  ~[apache-cassandra-2.2.8.jar:2.2.8]
> 
>  
>  
> DN  10.xx.xx.xx  388.43 KB  256          6.9%              
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
> 
>  
> Under what conditions does this happen?
> 
>  
>  
> ----------------
> Thank you
> 
>  
>  
> 
> 
> Stefan Miklosovic
> 

Reply via email to