[ 
https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734114#comment-13734114
 ] 

Neha Narkhede commented on KAFKA-992:
-------------------------------------

Thanks for the follow up patch Guozhang. Overall, looks correct. Few minor 
suggestions -

9. ZkUtils

9.1. Could you add more details in the log message when the json parsing of the 
controller path fails? Since we know we are changing the format, something 
along the lines of "Json parsing of the controller path failed. Probably this 
controller is still using the old format [%s] of storing the broker id in the 
zookeeper path"
9.2 We don't need to convert the controller variable to string since it is 
already a string
9.3 Improve the error message when both json parsing and the toInt conversion 
fails. "Failed to parse the leader leaderinfo " doesn't say that we failed to 
parse the controller's
 leader election path.

10. ZookeeperLeaderElector
10.1 Remove unused import BrokerNotAvailableException
10.2 In elect() API, should'nt we use readDataMaybeNull instead of readData? 
That covers the case if the ephemeral node disappears before you get a chance 
to read it.
10.3 Since the changes to elect() are new, I suggest we convert the debug to 
info or warn statements. This elect() is rarely called, this will not pollute 
the log.
10.4 One suggestion to reduce code and make it somewhat cleaner - If we change 
electFinished to electionNotDone, we need to change it only in one place - 
where we don't need to retry. Currently we have to change electFinished 
multiple times at different places

                
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Neha Narkhede
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-992.v1.patch, KAFKA-992.v2.patch, 
> KAFKA-992.v3.patch, KAFKA-992.v4.patch, KAFKA-992.v5.patch, KAFKA-992.v6.patch
>
>
> The current behavior of zookeeper for ephemeral nodes is that session 
> expiration and ephemeral node deletion is not an atomic operation. 
> The side-effect of the above zookeeper behavior in Kafka, for certain corner 
> cases, is that ephemeral nodes can be lost even if the session is not 
> expired. The sequence of events that can lead to lossy ephemeral nodes is as 
> follows -
> 1. The session expires on the client, it assumes the ephemeral nodes are 
> deleted, so it establishes a new session with zookeeper and tries to 
> re-create the ephemeral nodes. 
> 2. However, when it tries to re-create the ephemeral node,zookeeper throws 
> back a NodeExists error code. Now this is legitimate during a session 
> disconnect event (since zkclient automatically retries the
> operation and raises a NodeExists error). Also by design, Kafka server 
> doesn't have multiple zookeeper clients create the same ephemeral node, so 
> Kafka server assumes the NodeExists is normal. 
> 3. However, after a few seconds zookeeper deletes that ephemeral node. So 
> from the client's perspective, even though the client has a new valid 
> session, its ephemeral node is gone.
> This behavior is triggered due to very long fsync operations on the zookeeper 
> leader. When the leader wakes up from such a long fsync operation, it has 
> several sessions to expire. And the time between the session expiration and 
> the ephemeral node deletion is magnified. Between these 2 operations, a 
> zookeeper client can issue a ephemeral node creation operation, that could've 
> appeared to have succeeded, but the leader later deletes the ephemeral node 
> leading to permanent ephemeral node loss from the client's perspective. 
> Thread from zookeeper mailing list: 
> http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to