Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]

via GitHub Wed, 22 Jan 2025 11:25:37 -0800


dsmiley commented on code in PR #2935:
URL: https://github.com/apache/solr/pull/2935#discussion_r1925844451



##########
solr/CHANGES.txt:
##########
@@ -176,6 +176,10 @@ Bug Fixes
 * SOLR-17629: If SQLHandler failed to open the underlying stream (e.g. Solr 
returns an error; could be user/syntax problem),
   it needs to close the stream to cleanup resources but wasn't. (David Smiley)
 
+* SOLR-17519: SolrJ fix to CloudSolrClient and HTTP ClusterState where it can 
fail to request cluster state
+  if its current live nodes list is stale. HTTP ClusterState now retains the 
initial configured list of passed URLs as backup

Review Comment:
   I don't think Solr users know about "HTTP ClusterState"; it's not likely in 
their vocabulary.  On the other hand, wording like "SolrJ CloudSolrClient 
configured with Solr URLs" is likely to be understood.



##########
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseHttpClusterStateProvider.java:
##########
@@ -216,38 +236,42 @@ private SimpleOrderedMap<?> submitClusterStateRequest(
 
   @Override
   public Set<String> getLiveNodes() {
-    if (liveNodes == null) {
-      throw new RuntimeException(
-          "We don't know of any live_nodes to fetch the"
-              + " latest live_nodes information from. "
-              + "If you think your Solr cluster is up and is accessible,"
-              + " you could try re-creating a new CloudSolrClient using 
working"
-              + " solrUrl(s) or zkHost(s).");
-    }
     if (TimeUnit.SECONDS.convert((System.nanoTime() - liveNodesTimestamp), 
TimeUnit.NANOSECONDS)
         > getCacheTimeout()) {
-      for (String nodeName : liveNodes) {
-        String baseUrl = Utils.getBaseUrlForNodeName(nodeName, urlScheme);
-        try (SolrClient client = getSolrClient(baseUrl)) {
-          Set<String> liveNodes = fetchLiveNodes(client);
-          this.liveNodes = (liveNodes);
-          liveNodesTimestamp = System.nanoTime();
-          return liveNodes;
-        } catch (Exception e) {
-          log.warn("Attempt to fetch cluster state from {} failed.", baseUrl, 
e);
-        }
-      }
+
+      if (liveNodes.stream()
+          .anyMatch((node) -> 
updateLiveNodes(URLUtil.getBaseUrlForNodeName(node, urlScheme))))
+        return this.liveNodes;
+
+      log.warn(
+          "Attempt to fetch cluster state from all known live nodes {} failed. 
Trying backup nodes",
+          liveNodes);
+
+      if (configuredNodes.stream().anyMatch((node) -> 
updateLiveNodes(node.toString())))

Review Comment:
   Shouldn't we check configuredNodes *first* so as to give the user the 
possibility of some control, basically addressing the conversation points in 
JIRA?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17519: CloudSolrClient with HTTP ClusterState can forget live nodes and then fail [solr]

Reply via email to