[ 
https://issues.apache.org/jira/browse/SOLR-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706145#comment-17706145
 ] 

Jan Høydahl commented on SOLR-16722:
------------------------------------

One simple option is to create a new top-level znode {{/disabled_nodes}}:
{code:java}
/live_nodes
  +-- foo:8983_node
  +-- bar:8983_node
/disabled_nodes
  +-- bar:8983_node{code}
The znode will normally be empty (or non-existing), but if it exists with >0 
children, then those nodes are flagged as disabled for traffic. It could be 
because the solr-operator is planning to shut down the node, or it could be a 
way to temporarily repel traffic from a node during troubleshooting. SolrJ 
would be updated to consider {{disabled_nodes}} in addition to replica-state 
and live_nodes. Also CLUSTERSTATUS response should include this information.

The node would still be "live" and will receive traffic, i.e. the znode is only 
a signal to SolrJ or other load balancers.

I think disabled_nodes children should be ephemeral so that entries are removed 
when a node is shut down, thus it cannot be used to repel traffic from a node 
across node restarts.

There could also be a new cluster API to set and clear the znode. We already 
have an API (CLUSTERSTATUS) to query it.

> API to flag a solr node NOT READY for requests
> ----------------------------------------------
>
>                 Key: SOLR-16722
>                 URL: https://issues.apache.org/jira/browse/SOLR-16722
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Jan Høydahl
>            Priority: Major
>
> Spinoff from solr operator PR 
> [https://github.com/apache/solr-operator/issues/529]
> When solr-operator performs a rolling restart or rolling upgrade, it will 
> stop one node at a time, but SolrJ (both external and internal) will continue 
> sending traffic to the node until requests start failing, since at the time 
> SolrJ picks up the "live_nodes" change, it is too late.
> While the operator PR mentioned above will prevent external requests through 
> the k8s service to the draining node, it will not prevent internal traffic.
> This issue thus aims to introduce some API or mechanism to flag a Solr node 
> as NOT READY for traffic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to