
Gwen Shapira commented on KAFKA-1387:

ZOOKEEPER-1809 was closed because the re-creation of the issue was buggy (the 
test app was actually creating two sessions at same time). 

I agree that Flavio indicated that ZNodes can hang around after expiration, but 
he also indicated the opposite in the email thread for ZOOKEEPER-1740.

Its important to get this right, so I'll do more research on the expected 
ZooKeeper behavior here.

One thing I'm not sure about is why does 
createEphemeralPathExpectConflictHandleZKBug loop indefinitely? 
If ZK indeed takes a bit of extra time to clean up, we can loop for specific 
amount of time (number of retries), like Curator typically does. After few 
seconds, the probability that the ZNode belongs to an active session and not an 
expired one is very high.

> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Fedor Korotkiy
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.

This message was sent by Atlassian JIRA

Reply via email to