Anatoly Fayngelerin created KAFKA-1082:
------------------------------------------
             Summary: zkclient dies after UnknownHostException in zk reconnect
                 Key: KAFKA-1082
                 URL: https://issues.apache.org/jira/browse/KAFKA-1082
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8, 0.7.2
            Reporter: Anatoly Fayngelerin


Moving this here from the dev list:

I've run into the following issue with the Kafka server. The zkclient lib seems 
to die silently if there is an UnknownHostException(or any IOException) while 
reconnecting the ZK session. I've filed a bug about this with the zkclient 
lib(https://github.com/sgroschupf/zkclient/issues/23). The ramifications for 
Kafka were the silent loss of all ephemeral nodes associated with the affected 
process. 

It is fairly easy to reproduce this locally using the following steps:
-- Configure a local kafka broker to connect to a local ZK instance using a DNS 
alias(e.g.  add "127.0.0.1 kafka-test-dns" to your /etc/hosts)
-- Start the broker, observe that ephemeral nodes have been added to ZK
-- Suspend the broker process, preventing it from sending heartbeats to the ZK 
instance. Observe the loss of ephemeral nodes in ZK.
-- Remove the DNS alias(e.g. comment out the /etc/hosts line).
-- Upon resuming the broker, the UknownHostException is logged. After this 
point, the server cannot re-establish its ZK connection. Re-enabling the alias, 
for example, does not resume normal operation. The broker continues accepting 
requests, without participating in the ZK protocols.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to