[ https://issues.apache.org/jira/browse/SOLR-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337096#comment-17337096 ]
David Smiley commented on SOLR-15386: ------------------------------------- If a replica's liveness was a function of _both_ live_nodes *and* the state.json (or applicable per-replica state), then this would be a non-issue? In other words, a replica's _actual_ status is overruled by the absence of the associated node in live_nodes. I believe [~ilan] has suggested this already in the past (not in response to this recent bug). That might be some substantial internal SolrCloud work, though. I can imagine another solution: When the DOWNNODE message is processed/received in NodeMutator.downNode, it should double-check that the node is still down. Assuming it is, proceed. It retains the ZK version of the particular node entry under live_nodes from when it checked in the previous step. It calls computeCollectionUpdate as normal, which may produce a list of updates to do. It puts these into a ZK "multi" to make them transactional/atomic with an additional "check" on the live_nodes to make the change conditional on there being no changes to live_nodes. See org.apache.zookeeper.Transaction. Then it executes the change in a loop where if the operation fails. If it failed due to the node being live again, we can stop. This is all quite a bit of work because it appears the code is not structured in a way presently to adjust to what I describe. Worse, for what I described to work, the way live nodes is maintained would need to change in some way to comply with Op.check which looks at the version of a node which is the version of it's data. Neither /live_nodes nor each node underneath contain any data thus the version is always zero. If each node there had "up" or "down" to reflect the state as opposed to mere presence of the node, that could work. But these are ephemeral nodes today, I recall. Or maybe live_nodes could be a simple list as its contents. ZK doesn't seem to offer an ability to create nodes with a specific version, which would solve that. I can imagine another variation that wouldn't mean changes to live nodes... I'll write about that tomorrow. > Internal DOWNNODE request will mark replicas down even if their host node is > now live > ------------------------------------------------------------------------------------- > > Key: SOLR-15386 > URL: https://issues.apache.org/jira/browse/SOLR-15386 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 8.6 > Reporter: Megan Carey > Priority: Major > > When a node is shutting down, it calls into: > # > [CoreContainer.shutdown()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L1026] > # > [ZkController.preClose()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L612] > # > [ZkController.publishNodeAsDown|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L2753] > This sends a request to Overseer to mark all of the replicas DOWN for the > soon-to-be down node. > # > [Overseer.processMessage()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L459] > # > [NodeMutator.downNode()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/overseer/NodeMutator.java#L48] > The issue we encountered was as follows: > # Solr node shuts down > # DOWNNODE message is enqueued for Overseer > # Solr node comes back up (running on K8s, so a new node is auto-started as > soon as the old node was detected as down) > # DOWNNODE was dequeued for processing, and marked all replicas DOWN for the > node that is now live. > The only place where these replicas would later be marked ACTIVE again is > after ShardLeaderElection, but we did not reach that case. An easy fix is to > add a check for node liveness prior to marking replicas down, but a lot of > tests fail with this change. Was this the intended functionality? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org