[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520867#comment-14520867 ]
Thomas Omans commented on KAFKA-1387: ------------------------------------- It looks like this "infinite retry" behavior is only in kafka to accomodate another strange issue where zookeeper was deleting ephemeral nodes out from under it: https://github.com/apache/kafka/blob/0.8.2.1/core/src/main/scala/kafka/utils/ZkUtils.scala#L272 https://issues.apache.org/jira/browse/ZOOKEEPER-1740 It seems the simplest thing to do would be to just delete the conflicted node and write the truth about the process environment it knows. I see that my issue appeared in the consumer code, where this issue is occurring in the kafka brokers themselves, but the bug appears to be the same: There are two exceptional cases in ephemeral nodes that I can see, either the ZOOKEEPER-1740 bug was hit in which case our ephemeral node mysteriously was lost out from under us, but our session is still active and we can just create a new one. The other bug I believe we are seeing is that the session is long gone but the ephemeral node is still hanging around until the consumer process exits. Currently the first case is handled, but I the second case is not. > Kafka getting stuck creating ephemeral node it has already created when two > zookeeper sessions are established in a very short period of time > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1387 > URL: https://issues.apache.org/jira/browse/KAFKA-1387 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.1.1 > Reporter: Fedor Korotkiy > Labels: newbie, patch > Attachments: kafka-1387.patch > > > Kafka broker re-registers itself in zookeeper every time handleNewSession() > callback is invoked. > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala > > Now imagine the following sequence of events. > 1) Zookeeper session reestablishes. handleNewSession() callback is queued by > the zkClient, but not invoked yet. > 2) Zookeeper session reestablishes again, queueing callback second time. > 3) First callback is invoked, creating /broker/[id] ephemeral path. > 4) Second callback is invoked and it tries to create /broker/[id] path using > createEphemeralPathExpectConflictHandleZKBug() function. But the path is > already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting > stuck in the infinite loop. > Seems like controller election code have the same issue. > I'am able to reproduce this issue on the 0.8.1 branch from github using the > following configs. > # zookeeper > tickTime=10 > dataDir=/tmp/zk/ > clientPort=2101 > maxClientCnxns=0 > # kafka > broker.id=1 > log.dir=/tmp/kafka > zookeeper.connect=localhost:2101 > zookeeper.connection.timeout.ms=100 > zookeeper.sessiontimeout.ms=100 > Just start kafka and zookeeper and then pause zookeeper several times using > Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.3.4#6332)