[ 
https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724789#comment-13724789
 ] 

Guozhang Wang commented on KAFKA-992:
-------------------------------------

We can differentiate this edge case from a temporal connection loss by adding a 
timestamp into the broker ZK string so that the conflict will be reflected. 
Then we can check if the host:port are the same. If this is the case, then we 
can treat this ephemeral node as written by the broker itself but from a 
previous session, hence backoff for it to be deleted on session timeout and 
retry creating the ephemeral node. This will make the temporal connection loss 
a false positive case, but it should be fine since this case happens rarely.

                
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>
> There is a potential bug in Zookeeper that when the ZK leader processes a lot 
> of session expiration events (this could be due to a long GC or a fsync 
> operation, etc), it marks the session as expired but does not delete the 
> corresponding ephemeral znode at the same time. 
> Meanwhile, a new session event will be fired on the kafka server and the 
> server will request the same ephemeral node to be created on handling the new 
> session. When it enters the zookeeper processing queue, this operation 
> receives a NodeExists error since zookeeper leader has not finished deleting 
> that ephemeral znode and still thinks the previous session holds it. Kafka 
> assumes that the NodeExists error on ephemeral node creation is ok since that 
> is a legitimate condition that happens during session disconnects on 
> zookeeper. However, a NodeExists error is only valid if the owner session id 
> also matches Kafka server's current zookeeper session id. The bug is that 
> before sending a NodeExists error, Zookeeper should check if the ephemeral 
> node in question is held by a session that has marked as expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to