[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690673#comment-17690673 ]
Ishan Chattopadhyaya commented on SOLR-6405: -------------------------------------------- Through my testing with solr-bench, I've seen many cases (say 1 in 25-30) where nodes come up, recovery of replicas happen for a few replicas and then that doesn't complete for all replicas (and the restarted node stays with some replicas in DOWN state). I tracked them down to Solr not re-connecting to ZooKeeper after a session loss. I should add that this test is repeatable for me, but in order to reproduce this, I have to wait several hours of running (or even days). This situation was so annoying while developing the test suite (because of infinite hang/wait for all replicas to come up) that I bailed out on those with a timeout and failed the test and moved on. But definitely something on my radar to revisit/address/fix. FYI [~noblepaul]. > ZooKeeper calls can easily not be retried enough on ConnectionLoss. > ------------------------------------------------------------------- > > Key: SOLR-6405 > URL: https://issues.apache.org/jira/browse/SOLR-6405 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Reporter: Mark Miller > Assignee: Mark Miller > Priority: Critical > Fix For: 4.10, 6.0 > > Attachments: SOLR-6405.patch > > > The current design requires that we are sure we retry on connection loss > until session expiration. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org