Hello everyone,

I have a cluster with 2 datacenters. I am using GossipingPropertyFileSnitch as my endpoint snitch. Cassandra version 4.1.8. One datacenter is fully Ubuntu 24.04 and OpenJDK 11 and another is Ubuntu 20.04 on OpenJDK 8. A seed node died in my second DC running Ubuntu 20.04 hosts. I ordered a new dedicated server. I updated my seeds to forget the dead seed node. I did the steps to replace a dead node

JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS -Dcassandra.replace_address_first_boot=<dead_node_ip>"

Configs between the old/new node are identical minus IP addresses and that line above in the env file to replace the dead node. I started the node and it started replacing the old node and was in the `UJ` state. Not long into the process, the new node stops processing data and the cluster forgets the new node and remembers the old one in its `DN` state (which is turned off, no power). There are no errors in the logs. I've tried different times hoping to solve the issue. I upped my ROOT logging level to DEBUG, I also set "org.apache.cassandra.gms.Gossiper TRACE". No errors.

With TRACE set for the Gossiper, I notice gossiping stops and data stopping streaming about the same time. I cannot run any nodetool commands on the new node. The process doesn't die, it leaves open connections to nodes that are streaming data, but I don't see any data streaming.

I've thought through a lot. Space isn't an issue, ulimits are set high in /etc/security/limits.conf. Checking /proc/<pid>/limits shows the values are high. I've replaced nodes before like this without issue, but this one is causing me grief. Is there anything more I can do?

Courtney

Reply via email to