Megan Carey created SOLR-15386:
----------------------------------

             Summary: Internal DOWNNODE request will mark replicas down even if 
their host node is now live
                 Key: SOLR-15386
                 URL: https://issues.apache.org/jira/browse/SOLR-15386
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.6
            Reporter: Megan Carey


When a node is shutting down, it calls into:
 # 
[CoreContainer.shutdown()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L1026]
 # 
[ZkController.preClose()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L612]
 # 
[ZkController.publishNodeAsDown|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L2753]

This sends a request to Overseer to mark all of the replicas DOWN for the 
soon-to-be down node.
# 
[Overseer.processMessage()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L459]
# 
[NodeMutator.downNode()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/overseer/NodeMutator.java#L48]

The issue we encountered was as follows:
# Solr node shuts down
# DOWNNODE message is enqueued for Overseer
# Solr node comes back up (running on K8s, so a new node is auto-started as 
soon as the old node was detected as down)
# DOWNNODE was dequeued for processing, and marked all replicas DOWN for the 
node that is now live.

The only place where these replicas would later be marked ACTIVE again is after 
ShardLeaderElection, but we did not reach that case. An easy fix is to add a 
check for node liveness prior to marking replicas down, but a lot of tests fail 
with this change. Was this the intended functionality? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to