Hi,

Following up on this. I'd still say that the issue here seems to be that your 
zookeeper config lists 0.0.0.0 as ip address for client connections.

>>> The problem is related to the fact that we run solr and the zookeeper
>>> ensemble dockerized. As we cannot bind zookeeper from docker to its host's
>>> external ip address, we have to use "0.0.0.0" as the server address

I don't know how you run these zk's dockerized, but I'd look for a workaround 
where you can configure the correct address in zk's configuration. Then Solr 
will be happy.

Are you saying that in 8.11, the test with zkcli.sh to 0.0.0.0:2181 returns 
immediately instead of after 30s?

Jan

> 15. des. 2022 kl. 07:10 skrev michael dürr <due...@gmail.com>:
> 
> Hi Jan,
> 
> Thanks for answering!
> 
> I'm pretty sure the reason is related to the problem that solr tries to
> connect to "0.0.0.0" as it reads that IP from the /zookeeper/config znode
> of the zookeeper ensemble.
> The connection I'm talking about is when
> ZookeeperStatusHandler.getZkRawResponse(String zkHostPort, String
> fourLetterWordCommand) tries to open a Socket to "0.0.0.0:2181".
> After a while the connect fails but as said this takes a long time. I did
> not debug deeper as this already is jdk code then.
> 
> The timings for the valid zookeeper addresses (i.e. those from the static
> configuration string) are listed later. What causes problems is the attempt
> to connect to 0.0.0.0:2181:
> 
> /opt/solr-9.1.0$ export ZK_HOST=0.0.0.0:2181
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /zookeeper/config
> WARN  - 2022-12-15 06:57:44.828; org.apache.solr.common.cloud.SolrZkClient;
> Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure,
> it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE'
> ACLs to Zookeeper nodes
> INFO  - 2022-12-15 06:57:44.852;
> org.apache.solr.common.cloud.ConnectionManager; Waiting up to 30000ms for
> client to connect to ZooKeeper
> Exception in thread "main" org.apache.solr.common.SolrException:
> java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
> 0.0.0.0:2181 within 30000 ms
>        at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225)
>        at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:137)
>        at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120)
>        at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:260)
> Caused by: java.util.concurrent.TimeoutException: Could not connect to
> ZooKeeper 0.0.0.0:2181 within 30000 ms
>        at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:297)
>        at
> org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:216)
>        ... 3 more
> 
> real    0m31.728s
> user    0m3.284s
> sys     0m0.226s
> 
> Of course this will fail but this was not a problem before (solr 8.11.1).
> The call also failed but returned fast.
> 
> Here the timings you are interested in for each of my 3 zookeeper nodes
> (adjusted to my setup). The interesting part are the results from fetching
> the /zookeeper/config as it shows the server configurations that include
> the "0.0.0.0" addresses:
> 
> /opt/solr-9.1.0$ export ZK_HOST=192.168.0.109:2181
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /zookeeper/config
> server.1=0.0.0.0:2888:3888:participant;0.0.0.0:2181
> server.2=192.168.0.126:2888:3888:participant;0.0.0.0:2181
> server.3=192.168.0.2:2888:3888:participant;0.0.0.0:2181
> version=0
> 
> real    0m0.810s
> user    0m3.142s
> sys     0m0.148s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd ls /solr/live_nodes
> /solr/live_nodes (2)
> /solr/live_nodes/192.168.0.222:8983_solr (0)
> /solr/live_nodes/192.168.0.223:8983_solr (0)
> 
> real    0m0.838s
> user    0m3.166s
> sys     0m0.210s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /solr/configs/cms_20221214_142242/stopwords.txt
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # ...
> 
> real    0m0.836s
> user    0m3.121s
> sys     0m0.173s
> 
> /opt/solr-9.1.0$ export ZK_HOST=192.168.0.126:2181
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /zookeeper/config
> server.1=192.168.0.109:2888:3888:participant;0.0.0.0:2181
> server.2=0.0.0.0:2888:3888:participant;0.0.0.0:2181
> server.3=192.168.0.2:2888:3888:participant;0.0.0.0:2181
> version=0
> 
> real    0m0.843s
> user    0m3.300s
> sys     0m0.183s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd ls /solr/live_nodes
> /solr/live_nodes (2)
> /solr/live_nodes/192.168.0.222:8983_solr (0)
> /solr/live_nodes/192.168.0.223:8983_solr (0)
> 
> real    0m0.807s
> user    0m3.035s
> sys     0m0.164s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /solr/configs/cms_20221214_142242/stopwords.txt
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # ...
> 
> real    0m0.859s
> user    0m3.354s
> sys     0m0.177s
> 
> export ZK_HOST=192.168.0.2:2181
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /zookeeper/config
> server.1=192.168.0.109:2888:3888:participant;0.0.0.0:2181
> server.2=192.168.0.126:2888:3888:participant;0.0.0.0:2181
> server.3=0.0.0.0:2888:3888:participant;0.0.0.0:2181
> version=0
> 
> real    0m0.790s
> user    0m2.838s
> sys     0m0.154s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd ls /solr/live_nodes
> /solr/live_nodes (2)
> /solr/live_nodes/192.168.0.222:8983_solr (0)
> /solr/live_nodes/192.168.0.223:8983_solr (0)
> 
> real    0m0.861s
> user    0m3.201s
> sys     0m0.169s
> 
> /opt/solr-9.1.0$ time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST
> -cmd get /solr/configs/cms_20221214_142242/stopwords.txt
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # ...
> 
> real    0m0.779s
> user    0m3.081s
> sys     0m0.184s
> 
> Thanks,
> Michael
> 
> On Wed, Dec 14, 2022 at 10:08 PM Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> Hi,
>> 
>> We always check how the zookeeper ensemble is configured, and this
>> check does not depend on whether dynamic reconfiguration is possible or
>> not,
>> it is simply to detect the common mistake that a 3 node ensemble is
>> addressed
>> with only one of the hosts in the static config, or with wrong host names.
>> 
>> Sounds like your problem is not with how Solr talks to ZK, but in how you
>> have configured your network. You say
>> 
>>> But this will cause the socket connect to block when resolving
>>> "0.0.0.0" which makes everything very slow.
>> 
>> Can you elaborate on exactly which connection you are talking about
>> here, and why/where it is blocking? Can you perhaps attempt a few commands
>> from the command line to illustrate your point?
>> 
>> Assuming you are on Linux, and have the 'time' command available, try this
>> 
>> export ZK_HOST=my-zookeeper:2181
>> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd get
>> /zookeeper/config
>> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd ls /live_nodes
>> time server/scripts/cloud-scripts/zkcli.sh -z $ZK_HOST -cmd get
>> /configs/_default/stopwords.txt
>> 
>> What kind of timings do you see?
>> 
>> Jan
>> 
>>> 14. des. 2022 kl. 13:23 skrev michael dürr <due...@gmail.com>:
>>> 
>>> Hi,
>>> 
>>> Since we have updated to Solr 9.1, the admin ui has become pretty slow.
>>> 
>>> The problem is related to the fact that we run solr and the zookeeper
>>> ensemble dockerized. As we cannot bind zookeeper from docker to its
>> host's
>>> external ip address, we have to use "0.0.0.0" as the server address which
>>> causes problems when solr tries to get the zookeeper status (via
>>> /solr/admin/zookeeper/status)
>>> 
>>> Some debugging showed that ZookeeperStatusHandler.getZkStatus() always
>>> tries to get the dynamic configuration from zookeeper in order to check
>>> whether it contains all hosts of solr's static zookeeper configuration
>>> string. But this will cause the socket connect to block when resolving
>>> "0.0.0.0" which makes everything very slow.
>>> 
>>> The approach to check whether zookeeper allows for dynamic
>> reconfiguration
>>> is based on the existence of the znode /zookeeper/config which seems not
>> to
>>> be a good approach as this znode will exist even in case the zookeeper
>>> ensemble does not allow dynamic reconfiguration (reconfigEnabled=false).
>>> 
>>> Can anybody suggest some simple action to avoid that blocking (i.e. the
>>> dynamic configuration check) in order to get the status request return
>> fast
>>> again?
>>> 
>>> It would be nice to have a configuration parameter that disables this
>> check
>>> independent of the zookeeper ensemble status. Especially as
>>> reconfigEnabled=false is the default setting for zookeeper.
>>> 
>>> Thanks,
>>> Michael
>> 
>> 

Reply via email to