Klaus Herrmann created SOLR-5359:
------------------------------------

             Summary: CloudSolrServer tries to connect to zookeeper forever 
when ensemble is unavailable
                 Key: SOLR-5359
                 URL: https://issues.apache.org/jira/browse/SOLR-5359
             Project: Solr
          Issue Type: Bug
          Components: clients - java
    Affects Versions: 4.5
            Reporter: Klaus Herrmann


When opening a new CloudSolrServer against an unavailable zookeeper ensemble, 
the following exception messages are logged:


INFO  [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Opening socket 
connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate 
using SASL (unknown error)
WARN  [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
INFO  [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Opening socket 
connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate 
using SASL (unknown error)
WARN  [hybrisHTTP28-SendThread(localhost:2181)] [ClientCnxn] Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

This is consistent with the behaviour of zkCli.sh - however, it does never 
timeout. zkCli.sh stops connecting after 30 seconds, but the zookeeper 
connection attempts by CloudSolrServer show the above messages forever, 
regardless of ZkClientTimeout and ZkConnectTimeout. 

Calls to e.g. isAlive() do indeed time out, but that does not stop the 
underlying CloudSolrServer instance from connecting. 

It does not seem to be possible to set a different zkHost for an existing 
CloudSolrServer instance either, so once an instance is created with a 
bad/wrong zkHost string it seems impossible to destroy. 
Even if the zkHost were correct and just the ensemble down one has to keep 
around the CloudSolrService and not dismiss it after a failed connection 
attempt - otherwise each try will generate a new ZkClient that then attempts to 
conncet forever, leading to more and more client attempts, as the clients never 
stop and are never garbage collected.

I believe the CloudSolrService/ZkClient should stop trying to connect 
altogether after a timeout and be garbage collected. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to