[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533402#comment-14533402 ]
Abhishek Nigam commented on KAFKA-1387: --------------------------------------- I have seen the ephemeral node issue before and the fix made there was exactly what Thomas mentioned: "It seems the simplest thing to do would be to just delete the conflicted node and write the truth about the process environment it knows." Is there a reason why the approach outlined by Thomas does not work for kafka? > Kafka getting stuck creating ephemeral node it has already created when two > zookeeper sessions are established in a very short period of time > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1387 > URL: https://issues.apache.org/jira/browse/KAFKA-1387 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.1.1 > Reporter: Fedor Korotkiy > Priority: Blocker > Labels: newbie, patch, zkclient-problems > Attachments: kafka-1387.patch > > > Kafka broker re-registers itself in zookeeper every time handleNewSession() > callback is invoked. > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala > > Now imagine the following sequence of events. > 1) Zookeeper session reestablishes. handleNewSession() callback is queued by > the zkClient, but not invoked yet. > 2) Zookeeper session reestablishes again, queueing callback second time. > 3) First callback is invoked, creating /broker/[id] ephemeral path. > 4) Second callback is invoked and it tries to create /broker/[id] path using > createEphemeralPathExpectConflictHandleZKBug() function. But the path is > already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting > stuck in the infinite loop. > Seems like controller election code have the same issue. > I'am able to reproduce this issue on the 0.8.1 branch from github using the > following configs. > # zookeeper > tickTime=10 > dataDir=/tmp/zk/ > clientPort=2101 > maxClientCnxns=0 > # kafka > broker.id=1 > log.dir=/tmp/kafka > zookeeper.connect=localhost:2101 > zookeeper.connection.timeout.ms=100 > zookeeper.sessiontimeout.ms=100 > Just start kafka and zookeeper and then pause zookeeper several times using > Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.3.4#6332)