What version of Solr are you on? If you are on 8.7.0+ the timeout used on the SolrDispatchFilter is configurable and should work correctly since https://issues.apache.org/jira/browse/SOLR-14503 The timeout was configurable in earlier releases, but the wrong constructor of SolrZkClient was being used so effectively it as hardcoded to 30 seconds in those releases. - https://github.com/apache/solr/blame/da1267a56d5dd46f22288995858cb32b588b3928/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L379
There's settings for it In solr.in.sh: *# By default Solr will try to connect to Zookeeper with 30 seconds in timeout; override the timeout if needed#SOLR_WAIT_FOR_ZK="30"* In solr.in.cmd (since 8.7.0): *REM By default Solr will try to connect to Zookeeper with 30 seconds in timeout; override the timeout if neededREM set SOLR_WAIT_FOR_ZK=30* or as a system property *-DwaitForZk=X* I'm not familiar with ZkContainer, it looks to me like the SolrDispatchFilter loadNodeConfig(...) will already have been called at the point ZkContainer initZooKeeper(...) is called, so unless ZK goes down between the two calls, the timeout in ZkContainer should be immaterial because a successful connection was already made, so setting SOLR_WAIT_FOR_ZK should be sufficient? On Tue, 13 Jul 2021 at 08:25, Bram Van Dam <bram.van...@intix.eu> wrote: > On 12/07/2021 18:10, Yuval Paz wrote: > >> Wouldn't it make more sense to set this value to $zkClientTimeout x > > $numServers? > > > > I don't think it would make much sense to calculate it in such a method. > > Maybe not, but in the worst case the last ZK server won't be tried > unless the timeout is that long. But sure, having it configurable in > general would be just as good. > > > In there they wrote: > >> Note that unlike other configuration stored in ZK, this probably needs > to > > be bootstrapped via a System property. > > > > I'm not completely sure why they wrote it, if anyone knows please let me > > know :), apart from this detail it should be pretty straight forward to > > implement this > > It seems like the other ZK timeout value (zkClientTimeout) is injected > into CloudConfig using a system property > > private int zkClientTimeout = Integer.getInteger("zkClientTimeout", > DEFAULT_ZK_CLIENT_TIMEOUT); > > This, in turn, is injected from within the solr script: > > CLOUD_MODE_OPTS=("-DzkClientTimeout=$ZK_CLIENT_TIMEOUT") > > Which in turn comes from solr.in.(sh|bat): > > #ZK_CLIENT_TIMEOUT="15000" > REM set ZK_CLIENT_TIMEOUT=15000 > > So I'm guessing it makes sense to expose ZK_CLIENT_CONNECT_TIMEOUT in > the same way? > > Seems pretty easy to implement. I'll discuss this with $employer, with a > bit of luck you can expect a patch soon. > > - Bram >