[
https://issues.apache.org/jira/browse/KAFKA-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manikumar resolved KAFKA-764.
-----------------------------
Resolution: Duplicate
This old issue is similar to KAFKA-7165. Closing this as duplicate KAFKA-7165
> Race Condition in Broker Registration after ZooKeeper disconnect
> ----------------------------------------------------------------
>
> Key: KAFKA-764
> URL: https://issues.apache.org/jira/browse/KAFKA-764
> Project: Kafka
> Issue Type: Bug
> Components: zkclient
> Affects Versions: 0.7.1
> Reporter: Bob Cotton
> Priority: Major
> Attachments: BPPF_2900-Broker_Logs.tbz2
>
>
> When running our ZooKeepers in VMware, occasionally all the keepers
> simultaneously pause long enough for the Kafka clients to time out and then
> the keepers simultaneously un-pause.
> When this happens, the zk clients disconnect from ZooKeeper. When ZooKeeper
> comes back ZkUtils.createEphemeralPathExpectConflict discovers the node id of
> itself and does not re-register the broker id node and the function call
> succeeds. Then ZooKeeper figures out the broker disconnected from the keeper
> and deletes the ephemeral node *after* allowing the consumer to read the data
> in the /brokers/ids/x node. The broker then goes on to register all the
> topics, etc. When consumers connect, they see topic nodes associated with
> the broker but thy can't find the broker node to get connection information
> for the broker, sending them into a rebalance loop until they reach
> rebalance.retries.max and fail.
> This might also be a ZooKeeper issue, but the desired behavior for a
> disconnect case might be, if the broker node is found to explicitly delete
> and recreate it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)