[ 
https://issues.apache.org/jira/browse/SOLR-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337096#comment-17337096
 ] 

David Smiley commented on SOLR-15386:
-------------------------------------

If a replica's liveness was a function of _both_ live_nodes *and* the 
state.json (or applicable per-replica state), then this would be a non-issue?  
In other words, a replica's _actual_ status is overruled by the absence of the 
associated node in live_nodes.  I believe [~ilan] has suggested this already in 
the past (not in response to this recent bug).  That might be some substantial 
internal SolrCloud work, though.

I can imagine another solution:  When the DOWNNODE message is 
processed/received in NodeMutator.downNode, it should double-check that the 
node is still down.  Assuming it is, proceed.  It retains the ZK version of the 
particular node entry under live_nodes from when it checked in the previous 
step.  It calls computeCollectionUpdate as normal, which may produce a list of 
updates to do.  It puts these into a ZK "multi" to make them 
transactional/atomic with an additional "check" on the live_nodes to make the 
change conditional on there being no changes to live_nodes.  See 
org.apache.zookeeper.Transaction.  Then it executes the change in a loop where 
if the operation fails.  If it failed due to the node being live again, we can 
stop. This is all quite a bit of work because it appears the code is not 
structured in a way presently to adjust to what I describe.

Worse, for what I described to work, the way live nodes is maintained would 
need to change in some way to comply with Op.check which looks at the version 
of a node which is the version of it's data.  Neither /live_nodes nor each node 
underneath contain any data thus the version is always zero.  If each node 
there had "up" or "down" to reflect the state as opposed to mere presence of 
the node, that could work.  But these are ephemeral nodes today, I recall.  Or 
maybe live_nodes could be a simple list as its contents.  ZK doesn't seem to 
offer an ability to create nodes with a specific version, which would solve 
that.  

I can imagine another variation that wouldn't mean changes to live nodes... 
I'll write about that tomorrow.

> Internal DOWNNODE request will mark replicas down even if their host node is 
> now live
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-15386
>                 URL: https://issues.apache.org/jira/browse/SOLR-15386
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.6
>            Reporter: Megan Carey
>            Priority: Major
>
> When a node is shutting down, it calls into:
>  # 
> [CoreContainer.shutdown()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/core/CoreContainer.java#L1026]
>  # 
> [ZkController.preClose()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L612]
>  # 
> [ZkController.publishNodeAsDown|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L2753]
> This sends a request to Overseer to mark all of the replicas DOWN for the 
> soon-to-be down node.
> # 
> [Overseer.processMessage()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L459]
> # 
> [NodeMutator.downNode()|https://github.com/apache/lucene-solr/blob/branch_8_8/solr/core/src/java/org/apache/solr/cloud/overseer/NodeMutator.java#L48]
> The issue we encountered was as follows:
> # Solr node shuts down
> # DOWNNODE message is enqueued for Overseer
> # Solr node comes back up (running on K8s, so a new node is auto-started as 
> soon as the old node was detected as down)
> # DOWNNODE was dequeued for processing, and marked all replicas DOWN for the 
> node that is now live.
> The only place where these replicas would later be marked ACTIVE again is 
> after ShardLeaderElection, but we did not reach that case. An easy fix is to 
> add a check for node liveness prior to marking replicas down, but a lot of 
> tests fail with this change. Was this the intended functionality? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to