Hi, Following up on this. I'd still say that the issue here seems to be that your zookeeper config lists 0.0.0.0 as ip address for client connections.
>>> The problem is related to the fact that we run solr and the zookeeper >>> ensemble dockerized. As we cannot bind zookeeper from docker to its host's >>> external ip address, we have to use "0.0.0.0" as the server address I don't know how you run these zk's dockerized, but I'd look for a workaround where you can configure the correct address in zk's configuration. Then Solr will be happy. Are you saying that in 8.11, the test with zkcli.sh to 0.0.0.0:2181 returns immediately instead of after 30s? Jan > 15. des. 2022 kl. 07:10 skrev michael dürr <due...@gmail.com>: > > Hi Jan, > > Thanks for answering! > > I'm pretty sure the reason is related to the problem that solr tries to > connect to "0.0.0.0" as it reads that IP from the /zookeeper/config znode > of the zookeeper ensemble. > The connection I'm talking about is when > ZookeeperStatusHandler.getZkRawResponse(String zkHostPort, String > fourLetterWordCommand) tries to open a Socket to "0.0.0.0:2181". > After a while the connect fails but as said this takes a long time. I did > not debug deeper as this already is jdk code then. > > The timings for the valid zookeeper addresses (i.e. those from the static > configuration string) are listed later. What causes problems is the attempt > to connect to 0.0.0.0:2181: > > /opt/solr-9.1.0$ export ZK_HOST=0.0.0.0:2181 > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /zookeeper/config > WARN - 2022-12-15 06:57:44.828; org.apache.solr.common.cloud.SolrZkClient; > Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, > it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' > ACLs to Zookeeper nodes > INFO - 2022-12-15 06:57:44.852; > org.apache.solr.common.cloud.ConnectionManager; Waiting up to 30000ms for > client to connect to ZooKeeper > Exception in thread "main" org.apache.solr.common.SolrException: > java.util.concurrent.TimeoutException: Could not connect to ZooKeeper > 0.0.0.0:2181 within 30000 ms > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225) > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:137) > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120) > at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:260) > Caused by: java.util.concurrent.TimeoutException: Could not connect to > ZooKeeper 0.0.0.0:2181 within 30000 ms > at > org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:297) > at > org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:216) > ... 3 more > > real 0m31.728s > user 0m3.284s > sys 0m0.226s > > Of course this will fail but this was not a problem before (solr 8.11.1). > The call also failed but returned fast. > > Here the timings you are interested in for each of my 3 zookeeper nodes > (adjusted to my setup). The interesting part are the results from fetching > the /zookeeper/config as it shows the server configurations that include > the "0.0.0.0" addresses: > > /opt/solr-9.1.0$ export ZK_HOST=192.168.0.109:2181 > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /zookeeper/config > server.1=0.0.0.0:2888:3888:participant;0.0.0.0:2181 > server.2=192.168.0.126:2888:3888:participant;0.0.0.0:2181 > server.3=192.168.0.2:2888:3888:participant;0.0.0.0:2181 > version=0 > > real 0m0.810s > user 0m3.142s > sys 0m0.148s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd ls /solr/live_nodes > /solr/live_nodes (2) > /solr/live_nodes/192.168.0.222:8983_solr (0) > /solr/live_nodes/192.168.0.223:8983_solr (0) > > real 0m0.838s > user 0m3.166s > sys 0m0.210s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /solr/configs/cms_20221214_142242/stopwords.txt > # Licensed to the Apache Software Foundation (ASF) under one or more > # ... > > real 0m0.836s > user 0m3.121s > sys 0m0.173s > > /opt/solr-9.1.0$ export ZK_HOST=192.168.0.126:2181 > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /zookeeper/config > server.1=192.168.0.109:2888:3888:participant;0.0.0.0:2181 > server.2=0.0.0.0:2888:3888:participant;0.0.0.0:2181 > server.3=192.168.0.2:2888:3888:participant;0.0.0.0:2181 > version=0 > > real 0m0.843s > user 0m3.300s > sys 0m0.183s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd ls /solr/live_nodes > /solr/live_nodes (2) > /solr/live_nodes/192.168.0.222:8983_solr (0) > /solr/live_nodes/192.168.0.223:8983_solr (0) > > real 0m0.807s > user 0m3.035s > sys 0m0.164s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /solr/configs/cms_20221214_142242/stopwords.txt > # Licensed to the Apache Software Foundation (ASF) under one or more > # ... > > real 0m0.859s > user 0m3.354s > sys 0m0.177s > > export ZK_HOST=192.168.0.2:2181 > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /zookeeper/config > server.1=192.168.0.109:2888:3888:participant;0.0.0.0:2181 > server.2=192.168.0.126:2888:3888:participant;0.0.0.0:2181 > server.3=0.0.0.0:2888:3888:participant;0.0.0.0:2181 > version=0 > > real 0m0.790s > user 0m2.838s > sys 0m0.154s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd ls /solr/live_nodes > /solr/live_nodes (2) > /solr/live_nodes/192.168.0.222:8983_solr (0) > /solr/live_nodes/192.168.0.223:8983_solr (0) > > real 0m0.861s > user 0m3.201s > sys 0m0.169s > > /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST > -cmd get /solr/configs/cms_20221214_142242/stopwords.txt > # Licensed to the Apache Software Foundation (ASF) under one or more > # ... > > real 0m0.779s > user 0m3.081s > sys 0m0.184s > > Thanks, > Michael > > On Wed, Dec 14, 2022 at 10:08 PM Jan Høydahl <jan....@cominvent.com> wrote: > >> Hi, >> >> We always check how the zookeeper ensemble is configured, and this >> check does not depend on whether dynamic reconfiguration is possible or >> not, >> it is simply to detect the common mistake that a 3 node ensemble is >> addressed >> with only one of the hosts in the static config, or with wrong host names. >> >> Sounds like your problem is not with how Solr talks to ZK, but in how you >> have configured your network. You say >> >>> But this will cause the socket connect to block when resolving >>> "0.0.0.0" which makes everything very slow. >> >> Can you elaborate on exactly which connection you are talking about >> here, and why/where it is blocking? Can you perhaps attempt a few commands >> from the command line to illustrate your point? >> >> Assuming you are on Linux, and have the 'time' command available, try this >> >> export ZK_HOST=my-zookeeper:2181 >> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd get >> /zookeeper/config >> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd ls /live_nodes >> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd get >> /configs/_default/stopwords.txt >> >> What kind of timings do you see? >> >> Jan >> >>> 14. des. 2022 kl. 13:23 skrev michael dürr <due...@gmail.com>: >>> >>> Hi, >>> >>> Since we have updated to Solr 9.1, the admin ui has become pretty slow. >>> >>> The problem is related to the fact that we run solr and the zookeeper >>> ensemble dockerized. As we cannot bind zookeeper from docker to its >> host's >>> external ip address, we have to use "0.0.0.0" as the server address which >>> causes problems when solr tries to get the zookeeper status (via >>> /solr/admin/zookeeper/status) >>> >>> Some debugging showed that ZookeeperStatusHandler.getZkStatus() always >>> tries to get the dynamic configuration from zookeeper in order to check >>> whether it contains all hosts of solr's static zookeeper configuration >>> string. But this will cause the socket connect to block when resolving >>> "0.0.0.0" which makes everything very slow. >>> >>> The approach to check whether zookeeper allows for dynamic >> reconfiguration >>> is based on the existence of the znode /zookeeper/config which seems not >> to >>> be a good approach as this znode will exist even in case the zookeeper >>> ensemble does not allow dynamic reconfiguration (reconfigEnabled=false). >>> >>> Can anybody suggest some simple action to avoid that blocking (i.e. the >>> dynamic configuration check) in order to get the status request return >> fast >>> again? >>> >>> It would be nice to have a configuration parameter that disables this >> check >>> independent of the zookeeper ensemble status. Especially as >>> reconfigEnabled=false is the default setting for zookeeper. >>> >>> Thanks, >>> Michael >> >>