Issue replacing a dead node

Courtney Mon, 12 May 2025 01:53:20 -0700

Hello everyone,

I have a cluster with 2 datacenters. I am usingGossipingPropertyFileSnitch as my endpoint snitch. Cassandra version4.1.8. One datacenter is fully Ubuntu 24.04 and OpenJDK 11 and anotheris Ubuntu 20.04 on OpenJDK 8. A seed node died in my second DC runningUbuntu 20.04 hosts. I ordered a new dedicated server. I updated my seedsto forget the dead seed node. I did the steps to replace a dead node

JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS-Dcassandra.replace_address_first_boot=<dead_node_ip>"

Configs between the old/new node are identical minus IP addresses andthat line above in the env file to replace the dead node. I started thenode and it started replacing the old node and was in the `UJ` state.Not long into the process, the new node stops processing data and thecluster forgets the new node and remembers the old one in its `DN` state(which is turned off, no power). There are no errors in the logs. I'vetried different times hoping to solve the issue. I upped my ROOT logginglevel to DEBUG, I also set "org.apache.cassandra.gms.Gossiper TRACE". Noerrors.

With TRACE set for the Gossiper, I notice gossiping stops and datastopping streaming about the same time. I cannot run any nodetoolcommands on the new node. The process doesn't die, it leaves openconnections to nodes that are streaming data, but I don't see any datastreaming.

I've thought through a lot. Space isn't an issue, ulimits are set highin /etc/security/limits.conf. Checking /proc/<pid>/limits shows thevalues are high. I've replaced nodes before like this without issue, butthis one is causing me grief. Is there anything more I can do?


Courtney

Issue replacing a dead node

Reply via email to