[
https://issues.apache.org/jira/browse/CASSANDRA-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699011#comment-17699011
]
Brandon Williams commented on CASSANDRA-18319:
----------------------------------------------
bq. We believe that this is due to a misalignment on all nodes’ old_IP
expiration time.
This should only happen if the nodes' clocks aren't in sync, have you checked
this? Note that there is nothing kubernetes-specific here, C* has supported
IPs changing forever.
> Cassandra in Kubernetes: IP switch decommission issue
> -----------------------------------------------------
>
> Key: CASSANDRA-18319
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18319
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ines Potier
> Priority: Normal
>
> We have recently encountered a recurring old IP reappearance issue while
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> *Issue Description*
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have
> noticed that this behavior, associated with a decommission operation, can get
> the cluster into an erroneous state.
> Consider the following situation: a Cassandra node {{node1}} , with
> {{{}hostId1{}}}, owning 20.5% of the token ring, bounces and switches IP
> ({{{}old_IP{}}} → {{{}new_IP{}}}). After a couple gossip iterations, all
> other nodes’ nodetool status output includes a {{new_IP}} UN entry owning
> 20.5% of the token ring and no {{old_IP}} entry.
> Shortly after the bounce, {{node1}} gets decommissioned. Our cluster does not
> have a lot of data, and the decommission operation completes pretty quickly.
> Logs on other nodes start showing acknowledgment that {{node1}} has left and
> soon, nodetool status’ {{new_IP}} UL entry disappears. {{node1}} ‘s pod is
> deleted.
> After a minute delay, the cluster enters the erroneous state. An {{old_IP}}
> DN entry reappears in nodetool status, owning 20.5% of the token ring. No
> node owns this IP anymore and according to logs, {{old_IP}} is still
> associated with {{{}hostId1{}}}.
> *Issue Root Cause*
> By digging through Cassandra logs, and re-testing this scenario over and over
> again, we have reached the following conclusion:
> * Other nodes will continue exchanging gossip about {{old_IP}} , even after
> it becomes a fatClient.
> * The fatClient timeout and subsequent quarantine does not stop {{old_IP}}
> from reappearing in a node’s Gossip state, once its quarantine is over. We
> believe that this is due to a misalignment on all nodes’ {{old_IP}}
> expiration time.
> * Once {{new_IP}} has left the cluster, and {{old_IP}} next gossip state
> message is received by a node, StorageService will no longer face collisions
> (or will, but with an even older IP) for {{hostId1}} and its corresponding
> tokens. As a result, {{old_IP}} will regain ownership of 20.5% of the token
> ring.
> *Proposed fix*
> Following the above investigation, we were thinking about implementing the
> following fix:
> When a node receives a gossip status change with {{STATE_LEFT}} for a leaving
> endpoint {{{}new_IP{}}}, before evicting {{new_IP }}from the token ring,
> purge from Gossip (ie {{{}evictFromMembership{}}}) all endpoints that meet
> the following criteria:
> * {{endpointStateMap}} contains this endpoint
> * The endpoint is not currently a token owner
> ({{{}!tokenMetadata.isMember(endpoint){}}})
> * The endpoint’s {{hostId}} matches the {{hostId}} of {{new_IP}}
> * The endpoint is older than {{leaving_IP}}
> ({{{}Gossiper.instance.compareEndpointStartup{}}})
> * The endpoint’s token range (from {{{}endpointStateMap{}}}) intersects with
> {{{}new_IP{}}}’s
> This modification’s intention is to force nodes to realign on {{old_IP}}
> expiration, and expunge it from Gossip so it does not reappear after
> {{new_IP}} leaves the ring.
> Another approach we have also been considering is expunging {{old_IP}} at the
> moment of the StorageService collision resolution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]