[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087063#comment-14087063 ]
Joe Stein edited comment on KAFKA-1387 at 8/6/14 1:09 AM: ---------------------------------------------------------- [~junrao] I tested on trunk and it is much worse now. instead of looping on the /controller node (like it was before) ... node 3 actually overwrote/stole the /brokers/ids/2 (doing a get before had it as 192.168.30.1 and after it is 192.168.30.3) so now i have a situation where I have two running broker servers, each with the same broker id running (2), server 3 is the ("active") broker with all the topics being created on it and failing requests for producing and consuming (because all the data is on server 1 but that is not advertised).... and server 1 is still the controller handling preferred leader election, etc. was (Author: joestein): [~junrao] I tested on trunk and it is much worse now. instead of looping on the /controller node (like it was before) ... node 3 actually overwrote/stole the /brokers/ids/2 (doing a get before had it as 192.168.30.1 and after it is 192.168.30.1) so now i have a situation where I have two broker servers, each with the same broker id running, node 3 is the broker with all the topics being created on it and failing requests for producing and consuming (because all the data is on node 1 but that is not advertised).... and node 1 is still the controller. > Kafka getting stuck creating ephemeral node it has already created when two > zookeeper sessions are established in a very short period of time > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1387 > URL: https://issues.apache.org/jira/browse/KAFKA-1387 > Project: Kafka > Issue Type: Bug > Reporter: Fedor Korotkiy > > Kafka broker re-registers itself in zookeeper every time handleNewSession() > callback is invoked. > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala > > Now imagine the following sequence of events. > 1) Zookeeper session reestablishes. handleNewSession() callback is queued by > the zkClient, but not invoked yet. > 2) Zookeeper session reestablishes again, queueing callback second time. > 3) First callback is invoked, creating /broker/[id] ephemeral path. > 4) Second callback is invoked and it tries to create /broker/[id] path using > createEphemeralPathExpectConflictHandleZKBug() function. But the path is > already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting > stuck in the infinite loop. > Seems like controller election code have the same issue. > I'am able to reproduce this issue on the 0.8.1 branch from github using the > following configs. > # zookeeper > tickTime=10 > dataDir=/tmp/zk/ > clientPort=2101 > maxClientCnxns=0 > # kafka > broker.id=1 > log.dir=/tmp/kafka > zookeeper.connect=localhost:2101 > zookeeper.connection.timeout.ms=100 > zookeeper.sessiontimeout.ms=100 > Just start kafka and zookeeper and then pause zookeeper several times using > Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.2#6252)