Hello everyone,
I have a cluster with 2 datacenters. I am using
GossipingPropertyFileSnitch as my endpoint snitch. Cassandra version
4.1.8. One datacenter is fully Ubuntu 24.04 and OpenJDK 11 and another
is Ubuntu 20.04 on OpenJDK 8. A seed node died in my second DC running
Ubuntu 20.04 hosts. I ordered a new dedicated server. I updated my seeds
to forget the dead seed node. I did the steps to replace a dead node
JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS
-Dcassandra.replace_address_first_boot=<dead_node_ip>"
Configs between the old/new node are identical minus IP addresses and
that line above in the env file to replace the dead node. I started the
node and it started replacing the old node and was in the `UJ` state.
Not long into the process, the new node stops processing data and the
cluster forgets the new node and remembers the old one in its `DN` state
(which is turned off, no power). There are no errors in the logs. I've
tried different times hoping to solve the issue. I upped my ROOT logging
level to DEBUG, I also set "org.apache.cassandra.gms.Gossiper TRACE". No
errors.
With TRACE set for the Gossiper, I notice gossiping stops and data
stopping streaming about the same time. I cannot run any nodetool
commands on the new node. The process doesn't die, it leaves open
connections to nodes that are streaming data, but I don't see any data
streaming.
I've thought through a lot. Space isn't an issue, ulimits are set high
in /etc/security/limits.conf. Checking /proc/<pid>/limits shows the
values are high. I've replaced nodes before like this without issue, but
this one is causing me grief. Is there anything more I can do?
Courtney
- Issue replacing a dead node Courtney
-