[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151260#comment-14151260
 ] 

Jun Rao commented on KAFKA-1387:
--------------------------------

James,

Thanks for reporting this. Yes, what you discovered is a real problem. The fix 
is going to be tricky though. The issue is the following. When a client lose an 
ephemeral node in ZK due to session expiration, that ephemeral node is not 
removed exactly at expiration time, but a short time after (ZOOKEEPER-1740). 
When the client tries to recreate the ephemeral node and get a 
NodeExistException, one of the two things could happen: (1) the existing node 
is from the expired session and is on its way to be deleted, (2) the node is 
actually created on the latest session (The reason is what you discovered:  the 
client gets multiple handleNewSession() calls due to multiple session 
expiration events, but the node is created on the latest session). I am not 
sure if there is an easy way to distinguish the two cases though.

Overall, it seems to me that there are so many corner cases that one has to 
deal with during ZK session expiration. The simplest approach is probably to 
prevent session expiration from happening at all (e.g., set a larger session 
timeout).

> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Fedor Korotkiy
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to