Hi Cassandra community, We have recently encountered a recurring old IP reappearance issue while testing decommissions on some of our Kubernetes Cassandra staging clusters. We have not yet found other references to this issue online. We could really use some additional inputs/opinions, both on the problem itself and the fix we are currently considering.
*Issue Description* In Kubernetes, a Cassandra node can change IP at each pod bounce. We have noticed that this behavior, associated with a decommission operation, can get the cluster into an erroneous state. Consider the following situation: a Cassandra node node1 , with hostId1, owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP). After a couple gossip iterations, all other nodes’ nodetool status output includes a new_IP UN entry owning 20.5% of the token ring and no old_IP entry. Shortly after the bounce, node1 gets decommissioned. Our cluster does not have a lot of data, and the decommission operation completes pretty quickly. Logs on other nodes start showing acknowledgment that node1 has left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod is deleted. After a minute delay, the cluster enters the erroneous state. An old_IP DN entry reappears in nodetool status, owning 20.5% of the token ring. No node owns this IP anymore and according to logs, old_IP is still associated with hostId1. *Issue Root Cause* By digging through Cassandra logs, and re-testing this scenario over and over again, we have reached the following conclusion: - Other nodes will continue exchanging gossip about old_IP , even after it becomes a fatClient. - The fatClient timeout and subsequent quarantine does not stop old_IP from reappearing in a node’s Gossip state, once its quarantine is over. We believe that this is due to a misalignment on all nodes’ old_IP expiration time. - Once new_IP has left the cluster, and old_IP next gossip state message is received by a node, StorageService will no longer face collisions (or will, but with an even older IP) for hostId1 and its corresponding tokens. As a result, old_IP will regain ownership of 20.5% of the token ring. *Proposed fix* Following the above investigation, we were thinking about implementing the following fix: When a node receives a gossip status change with STATE_LEFT for a leaving endpoint new_IP, before evicting new_IP from the token ring, purge from Gossip (ie evictFromMembership) all endpoints that meet the following criteria: - endpointStateMap contains this endpoint - The endpoint is not currently a token owner ( !tokenMetadata.isMember(endpoint)) - The endpoint’s hostId matches the hostId of new_IP - The endpoint is older than leaving_IP ( Gossiper.instance.compareEndpointStartup) - The endpoint’s token range (from endpointStateMap) intersects with new_IP’s This modification’s intention is to force nodes to realign on old_IP expiration, and expunge it from Gossip so it does not reappear after new_IP leaves the ring. Additional opinions/ideas regarding the fix’s viability and the issue itself would be really helpful. Thanks in advance, Ines