[ 
https://issues.apache.org/jira/browse/SOLR-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063934#comment-15063934
 ] 

Mark Miller commented on SOLR-8416:
-----------------------------------

Thanks Michael,

* Looks like a bunch of imports were moved above the license header?
* We probably want to use real solr.xml config for this. Or make it params for 
the collection create call with reasonable defaults. We generally only use 
system properties for kind of internal fail safe options we don't expect to 
really be used. I'd be fine with reasonable defaults that could be overridden 
per collection create call, but we could also allow the defaults to be 
configurable via solr.xml.
{code}
+    Integer numRetries = 
Integer.getInteger("createCollectionWaitTimeTillActive", 10);
+    Boolean checkLeaderOnly = 
Boolean.getBoolean("createCollectionCheckLeaderActive");
{code}
* We should handle the checked exceptions this might throw like we do in other 
spots rather than use a catch-all Exception. There should be plenty of code to 
reference where we handle keeper and interrupted exception and do the right 
thing for each.
{code}
+      try {
+        zkStateReader.updateClusterState();
+        clusterState = zkStateReader.getClusterState();
+      }  catch (Exception e) {
+        throw new SolrException(ErrorCode.SERVER_ERROR, "Can't connect to zk 
server", e);
+      }
{code}
* I'd probably combine the following into one IF statement:
{code}
+          if (!clusterState.liveNodesContain(replica.getNodeName())) {
+            replicaNotAlive = replica.getCoreUrl();
+            nodeNotLive = replica.getNodeName();
+            break;
+          }
+          if (!state.equals(Replica.State.ACTIVE.toString())) {
+            replicaNotAlive = replica.getCoreUrl();
+            replicaState = state;
+            break;
+          }
{code}
* Should probably restore interrupt status and throw a SolrException.
{code}
+      try {
+        Thread.sleep(1000);
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+      }
{code}
* I'm not sure the return message is quite right. If a nodes state is not 
ACTIVE, it does not mean it's not Live. It can be DOWN and live or RECOVERING 
and Live, etc. A replica is either Live or not and then has a Live State if and 
only if it is Live.
* Needs some tests.

> Solr collection creation API should return after all cores are alive 
> ---------------------------------------------------------------------
>
>                 Key: SOLR-8416
>                 URL: https://issues.apache.org/jira/browse/SOLR-8416
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Michael Sun
>         Attachments: SOLR-8416.patch, SOLR-8416.patch, SOLR-8416.patch
>
>
> Currently the collection creation API returns once all cores are created. In 
> large cluster the cores may not be alive for some period of time after cores 
> are created. For any thing requested during that period, Solr appears 
> unstable and can return failure. Therefore it's better  the collection 
> creation API waits for all cores to become alive and returns after that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to