[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701418#comment-14701418
 ] 

Flavio Junqueira commented on KAFKA-1387:
-----------------------------------------

It doesn't look like it 'd be a small change to zkclient to fix this. We 
essentially need it to expose zk events as they occur. In the way it currently 
does it, the events are serialized and the operations are retried transparently 
so I don't know if the znode already exists because of a connection loss or if 
the session actually expired and there is a new one now. 

The simplest way around this seems to be to just re-register the consumer 
directly (delete and create) upon a node exists exception. This should work 
because of the following argument.

There are three possibilities when we get a node exists exception:

# The znode exists from a previous session and hasn't been reclaimed yet
# The znode exists because of a connection loss event while the znode was being 
created, so the second time we get an exception (event)
# The previous session has expired, a new one was created, and the registration 
was occurring around this transition, so when we execute handleNewSession for 
the new session, we get a node exists exception. 

In all these three cases, deleting and recreating seems fine. It is clearly 
conservative and more expensive than necessary, but at least it doesn't require 
changes to zkclient. Does it sound a reasonable? Do you see any problem? 

CC [~guozhang] [~jwl...@gmail.com]

> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Fedor Korotkiy
>            Priority: Blocker
>              Labels: newbie, patch, zkclient-problems
>         Attachments: kafka-1387.patch
>
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to