RE: Assassinate fails

Nick Hatfield Thu, 04 Apr 2019 08:09:49 -0700

This will sound a little silly but, have you tried rolling the cluster?

$> nodetool flush; nodetool drain; service cassandra stop
$> ps aux | grep ‘cassandra’


# make sure the process actually dies. If not you may need to kill -9 <pid>. 
Check first to see if nodetool can connect first, nodetool gossipinfo. If the 
connection is live and listening on the port, then just try re-running service 
cassandra stop again. Kill -9 as a last resort

$> service cassandra start
$> nodetool netstats | grep ‘NORMAL’  # wait for this to return before moving 
on to the next node.

Restart them all using this method, then run nodetool status again and see if 
it is listed.

Once other thing, I recall you said something about having to terminate a node 
and then replace it. Make sure that whichever node you did the –Dreplace flag 
on, does not still have it set when you start cassandra on it again!

From: Alex [mailto:m...@aca-o.com]
Sent: Thursday, April 04, 2019 4:58 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails


Hi Anthony,

Thanks for your help.

I tried to run multiple times in quick succession but it fails with :

-- StackTrace --
java.lang.RuntimeException: Endpoint still alive: /192.168.1.18 generation 
changed while trying to assassinate it
        at 
org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592)

I can see that the generation number for this node increases by 1 every time I 
call nodetool assassinate ; and the command itself waits for 30 seconds before 
assassinating node. When ran multiple times in quick succession, the command 
fails because the generation number has been changed by the previous instance.



In 'nodetool gossipinfo', the node is marked as "LEFT" on every node.

However, in 'nodetool describecluster', this node is marked as "unreacheable" 
on 3 nodes out of 5.



Alex



Le 04.04.2019 00:56, Anthony Grasso a écrit :
Hi Alex,

We wrote a blog post on this topic late last year: 
http://thelastpickle.com/blog/2018/09/18/assassinate.html.

In short, you will need to run the assassinate command on each node 
simultaneously a number of times in quick succession. This will generate a 
number of messages requesting all nodes completely forget there used to be an 
entry within the gossip state for the given IP address.

Regards,
Anthony

On Thu, 4 Apr 2019 at 03:32, Alex <m...@aca-o.com<mailto:m...@aca-o.com>> wrote:
Same result it seems:
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.1.18
#calling operation unsafeAssassinateEndpoint of mbean
org.apache.cassandra.net:type=Gossiper
#RuntimeMBeanException: java.lang.NullPointerException


There not much more to see in log files :
WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626
Gossiper.java:575 - Assassinating /192.168.1.18<http://192.168.1.18> via gossip
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627
Gossiper.java:585 - Sleeping for 30000ms to ensure 
/192.168.1.18<http://192.168.1.18> does
not change
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628
Gossiper.java:1029 - InetAddress /192.168.1.18<http://192.168.1.18> is now DOWN
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631
StorageService.java:2324 - Removing tokens [..] for 
/192.168.1.18<http://192.168.1.18>




Le 03.04.2019 17:10, Nick Hatfield a écrit :
> Run assassinate the old way. I works very well...
>
> wget -q -O jmxterm.jar
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
>
> java -jar ./jmxterm.jar
>
> $>open localhost:7199
>
> $>bean org.apache.cassandra.net:type=Gossiper
>
> $>run unsafeAssassinateEndpoint 192.168.1.18
>
> $>quit
>
>
> Happy deleting
>
> -----Original Message-----
> From: Alex [mailto:m...@aca-o.com<mailto:m...@aca-o.com>]
> Sent: Wednesday, April 03, 2019 10:42 AM
> To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
> Subject: Assassinate fails
>
> Hello,
>
> Short story:
> - I had to replace a dead node in my cluster
> - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
> - dead node has null host_id
> - assassinate on dead node fails with error
>
> How can I get rid of this dead node ?
>
>
> Long story:
> I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
> a new node from scratch and "replaced" the dead node using the
> information from this page
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
> It looked like the replacement went ok.
>
> I added two more nodes to strengthen the cluster.
>
> A few days have passed and the dead node is still visible and marked
> as "down" on 3 of 5 nodes in nodetool status:
>
> --  Address       Load       Tokens       Owns (effective)  Host ID
>                           Rack
> UN  192.168.1.9   16 GiB     256          35.0%
> 76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
> UN  192.168.1.12  16.09 GiB  256          34.0%
> 719601e2-54a6-440e-a379-c9cf2dc20564  rack1
> UN  192.168.1.14  14.16 GiB  256          32.6%
> d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
> UN  192.168.1.17  15.4 GiB   256          34.1%
> fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
> DN  192.168.1.18  24.3 GiB   256          33.7%             null
>                           rack1
> UN  192.168.1.22  19.06 GiB  256          30.7%
> 09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1
>
> Its host ID is null, so I cannot use nodetool removenode. Moreover
> nodetool assassinate 192.168.1.18 fails with :
>
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>
> And in system.log:
>
> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
> Gossiper.java:585 - Sleeping for 30000ms to ensure 
> /192.168.1.18<http://192.168.1.18> does
> not change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
> AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO
>  [IndexSummaryManager:1] 2019-03-27 17:40:03,620
> IndexSummaryRedistribution.java:75 - Redistributing index summaries
> INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
> Gossiper.java:1029 - InetAddress /192.168.1.18<http://192.168.1.18> is now 
> DOWN INFO  [RMI
> TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
> StorageService.java:2324 - Removing tokens [-1061369577393671924,...]
> ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226
> - Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException: null
>
>
> In system.peers, the dead node shows and has the same ID as the
> replacing node :
>
> cqlsh> select peer, host_id from system.peers;
>
>   peer         | host_id
> --------------+--------------------------------------
>   192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>   192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>    192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc
>   192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f
>   192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564
>
> Dead node and replacing node have different tokens in system.peers.
>
> I should add that I also tried decommission on a node that still
> 192.168.1.18 in its peers. - it is still marked as "leaving" 5 days
> later. Nothing in notetool netstats or nodetool compactionstats.
>
>
> Thank you for taking the time to read this. Hope you can help.
>
> Alex
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: 
> user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: 
> user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

RE: Assassinate fails

Reply via email to