[ https://issues.apache.org/jira/browse/KAFKA-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao resolved KAFKA-1082. ---------------------------- Resolution: Duplicate Fix Version/s: 0.8.3 This is already resolved in KAFKA-2169. > zkclient dies after UnknownHostException in zk reconnect > -------------------------------------------------------- > > Key: KAFKA-1082 > URL: https://issues.apache.org/jira/browse/KAFKA-1082 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.7.2, 0.8.0 > Reporter: Anatoly Fayngelerin > Assignee: Gwen Shapira > Fix For: 0.8.3 > > Attachments: KAFKA-1082.patch > > > Moving this here from the dev list: > I've run into the following issue with the Kafka server. The zkclient lib > seems to die silently if there is an UnknownHostException(or any IOException) > while reconnecting the ZK session. I've filed a bug about this with the > zkclient lib(https://github.com/sgroschupf/zkclient/issues/23). The > ramifications for Kafka were the silent loss of all ephemeral nodes > associated with the affected process. > It is fairly easy to reproduce this locally using the following steps: > -- Configure a local kafka broker to connect to a local ZK instance using a > DNS alias(e.g. add "127.0.0.1 kafka-test-dns" to your /etc/hosts) > -- Start the broker, observe that ephemeral nodes have been added to ZK > -- Suspend the broker process, preventing it from sending heartbeats to the > ZK instance. Observe the loss of ephemeral nodes in ZK. > -- Remove the DNS alias(e.g. comment out the /etc/hosts line). > -- Upon resuming the broker, the UknownHostException is logged. After this > point, the server cannot re-establish its ZK connection. Re-enabling the > alias, for example, does not resume normal operation. The broker continues > accepting requests, without participating in the ZK protocols. -- This message was sent by Atlassian JIRA (v6.3.4#6332)