If you are on 8.7.0+ the timeout used on the SolrDispatchFilter is configurable and should work correctly since
The actual use of SOLR_WAIT_FOR_ZK is only on loading the solr.xml from the zookeeper (code taken from main): if (!StringUtils.isEmpty(zkHost)) { int startUpZkTimeOut = Integer.getInteger("waitForZk", 30); startUpZkTimeOut *= 1000; try (SolrZkClient zkClient = new SolrZkClient(zkHost, startUpZkTimeOut, startUpZkTimeOut)) { if (zkClient.exists("/solr.xml", true)) { log.info("solr.xml found in ZooKeeper. Loading..."); byte[] data = zkClient.getData("/solr.xml", null, null, true); return SolrXmlConfig.fromInputStream(solrHome, new ByteArrayInputStream(data), nodeProperties, true); } } catch (Exception e) { throw new SolrException(ErrorCode.SERVER_ERROR, "Error occurred while loading solr.xml from zookeeper", e); } log.info("Loading solr.xml from SolrHome (not found in ZooKeeper)"); } After that, the property waitForZk is never used. After you created the CoreContainer all access to the Zookeeper is done from the ZkContainer the CoreContainer holds. In fact, SolrDispatcherFilter is itself loading the CoreContainer protected CoreContainer createCoreContainer(Path solrHome, Properties nodeProps) { NodeConfig nodeConfig = loadNodeConfig(solrHome, nodeProps); // <-- load config using "waitForZk" final CoreContainer coreContainer = new CoreContainer(nodeConfig, true); coreContainer.load(); // <-- loading the "ZkController" using the "ZkContainer" (with hard coded value of 30 seconds connection timeout) return coreContainer; } (comments added by me) Maybe a good solution is to use the waitForZk property for ZkContainer#initZooKeeper On Tue, Jul 13, 2021 at 3:49 PM Bram Van Dam <bram.van...@intix.eu> wrote: > On 13/07/2021 12:26, Colvin Cowie wrote: > > What version of Solr are you on? > > We observed this on 7.7. I assumed that SOLR_WAIT_FOR_ZK only applied to > embedded ZK instances and not external ensembles. But going by your > explanation, I'll have to revisit that. Thanks for pointing that out. > > Worth a shot. I'll see if I can backport this to 7.7. > > > I'm not familiar with ZkContainer, it looks to me like the > > SolrDispatchFilter loadNodeConfig(...) will already have been called at > the > > point ZkContainer initZooKeeper(...) is called, so unless ZK goes down > > between the two calls, the timeout in ZkContainer should be immaterial > > because a successful connection was already made, so setting > > SOLR_WAIT_FOR_ZK should be sufficient? > > Hmm, according to the stack trace we're seeing, Solr is creating a ZK > client from within ZkContainer::initZooKeeper. > SolrDispatcherFilter::loadNodeConfig does not occur in the stack trace. > But maybe the order of operations is different between 7.7 and 8.x. > > Here's the full stack trace: > > > org.apache.solr.common.SolrException.log(SolrException.java:159)|null:org.apache.solr.common.SolrException: > java.util.concurrent.TimeoutException: Could not connect to ZooKeeper > > PASWP01F.dns20.socgen:44011,PASWP01M.dns20.socgen:44011,PASWP04M.dns20.socgen:44011 > within 30000 ms > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:201) > at org.apache.solr.cloud.ZkController.<init>(ZkController.java:334) > at > org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:114) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:570) > at > > org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:253) > at > > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:173) > at > org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:136) > at > > org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:750) > at > > java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) > at > > java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) > at > > java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) > at > > java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658) > at > > org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744) > at > > org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:368) > at > org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497) > at > > org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459) > at > > org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:852) > at > > org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:278) > at > org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545) > at > > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) > at > > org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46) > at > org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:192) > at > > org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:505) > at > > org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:151) > at > > org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180) > at > > org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:453) > at > > org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64) > at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610) > at > org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529) > at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392) > at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313) > at > > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) > at > > org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150) > at > > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) > at > > org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:579) > at > > org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:240) > at > > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) > at > > org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:138) > at org.eclipse.jetty.server.Server.start(Server.java:415) > at > > org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:117) > at > > org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113) > at org.eclipse.jetty.server.Server.doStart(Server.java:382) > at > > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) > at > org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1572) > at > org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1512) > at java.base/java.security.AccessController.doPrivileged(Native > Method) > at > org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1511) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at org.eclipse.jetty.start.Main.invokeMain(Main.java:220) > at org.eclipse.jetty.start.Main.start(Main.java:490) > at org.eclipse.jetty.start.Main.main(Main.java:77) > Caused by: java.util.concurrent.TimeoutException: Could not connect to > ZooKeeper > > PASWP01F.dns20.socgen:44011,PASWP01M.dns20.socgen:44011,PASWP04M.dns20.socgen:44011 > within 30000 ms > at > > org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250) > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:193) > ... 53 more > >