Re: Assassinate fails

Alex Wed, 03 Apr 2019 09:32:16 -0700

Same result it seems:
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.1.18

#calling operation unsafeAssassinateEndpoint of mbeanorg.apache.cassandra.net:type=Gossiper

#RuntimeMBeanException: java.lang.NullPointerException



There not much more to see in log files :

WARN [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626Gossiper.java:575 - Assassinating /192.168.1.18 via gossipINFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 doesnot changeINFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWNINFO [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631StorageService.java:2324 - Removing tokens [..] for /192.168.1.18





Le 03.04.2019 17:10, Nick Hatfield a écrit :

Run assassinate the old way. I works very well...

wget -q -O jmxterm.jar
http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar

java -jar ./jmxterm.jar

$>open localhost:7199

$>bean org.apache.cassandra.net:type=Gossiper

$>run unsafeAssassinateEndpoint 192.168.1.18

$>quit


Happy deleting

-----Original Message-----
From: Alex [mailto:m...@aca-o.com]
Sent: Wednesday, April 03, 2019 10:42 AM
To: user@cassandra.apache.org
Subject: Assassinate fails

Hello,

Short story:
- I had to replace a dead node in my cluster
- 1 week after, dead node is still seen as DN by 3 out of 5 nodes
- dead node has null host_id
- assassinate on dead node fails with error

How can I get rid of this dead node ?


Long story:
I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
a new node from scratch and "replaced" the dead node using the
information from this page
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
It looked like the replacement went ok.

I added two more nodes to strengthen the cluster.

A few days have passed and the dead node is still visible and marked
as "down" on 3 of 5 nodes in nodetool status:

--  Address       Load       Tokens       Owns (effective)  Host ID
                          Rack
UN  192.168.1.9   16 GiB     256          35.0%
76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
UN  192.168.1.12  16.09 GiB  256          34.0%
719601e2-54a6-440e-a379-c9cf2dc20564  rack1
UN  192.168.1.14  14.16 GiB  256          32.6%
d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
UN  192.168.1.17  15.4 GiB   256          34.1%
fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
DN  192.168.1.18  24.3 GiB   256          33.7%             null
                          rack1
UN  192.168.1.22  19.06 GiB  256          30.7%
09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1

Its host ID is null, so I cannot use nodetool removenode. Moreover
nodetool assassinate 192.168.1.18 fails with :

error: null
-- StackTrace --
java.lang.NullPointerException

And in system.log:

INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:39:38,595
Gossiper.java:585 - Sleeping for 30000ms to ensure /192.168.1.18 does
not change INFO  [CompactionExecutor:547] 2019-03-27 17:39:38,669
AutoSavingCache.java:393 - Saved KeyCache (27316 items) in 163 ms INFO
 [IndexSummaryManager:1] 2019-03-27 17:40:03,620
IndexSummaryRedistribution.java:75 - Redistributing index summaries
INFO  [RMI TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,597
Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN INFO  [RMI
TCP Connection(16)-127.0.0.1] 2019-03-27 17:40:08,599
StorageService.java:2324 - Removing tokens [-1061369577393671924,...]
ERROR [GossipStage:1] 2019-03-27 17:40:08,600 CassandraDaemon.java:226
- Exception in thread Thread[GossipStage:1,5,main]
java.lang.NullPointerException: null

In system.peers, the dead node shows and has the same ID as thereplacing node :


cqlsh> select peer, host_id from system.peers;

  peer         | host_id
--------------+--------------------------------------
  192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
  192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
   192.168.1.9 | 76223d4c-9d9f-417f-be27-cebb791cddcc
  192.168.1.14 | d8017a03-7e4e-47b7-89b9-cd9ec472d74f
  192.168.1.12 | 719601e2-54a6-440e-a379-c9cf2dc20564

Dead node and replacing node have different tokens in system.peers.

I should add that I also tried decommission on a node that still
192.168.1.18 in its peers. - it is still marked as "leaving" 5 days
later. Nothing in notetool netstats or nodetool compactionstats.


Thank you for taking the time to read this. Hope you can help.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Assassinate fails

Reply via email to