[ https://issues.apache.org/jira/browse/SOLR-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706145#comment-17706145 ]
Jan Høydahl commented on SOLR-16722: ------------------------------------ One simple option is to create a new top-level znode {{/disabled_nodes}}: {code:java} /live_nodes +-- foo:8983_node +-- bar:8983_node /disabled_nodes +-- bar:8983_node{code} The znode will normally be empty (or non-existing), but if it exists with >0 children, then those nodes are flagged as disabled for traffic. It could be because the solr-operator is planning to shut down the node, or it could be a way to temporarily repel traffic from a node during troubleshooting. SolrJ would be updated to consider {{disabled_nodes}} in addition to replica-state and live_nodes. Also CLUSTERSTATUS response should include this information. The node would still be "live" and will receive traffic, i.e. the znode is only a signal to SolrJ or other load balancers. I think disabled_nodes children should be ephemeral so that entries are removed when a node is shut down, thus it cannot be used to repel traffic from a node across node restarts. There could also be a new cluster API to set and clear the znode. We already have an API (CLUSTERSTATUS) to query it. > API to flag a solr node NOT READY for requests > ---------------------------------------------- > > Key: SOLR-16722 > URL: https://issues.apache.org/jira/browse/SOLR-16722 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Jan Høydahl > Priority: Major > > Spinoff from solr operator PR > [https://github.com/apache/solr-operator/issues/529] > When solr-operator performs a rolling restart or rolling upgrade, it will > stop one node at a time, but SolrJ (both external and internal) will continue > sending traffic to the node until requests start failing, since at the time > SolrJ picks up the "live_nodes" change, it is too late. > While the operator PR mentioned above will prevent external requests through > the k8s service to the draining node, it will not prevent internal traffic. > This issue thus aims to introduce some API or mechanism to flag a Solr node > as NOT READY for traffic. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org