[ https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728218#comment-13728218 ]
Neha Narkhede edited comment on KAFKA-992 at 8/2/13 11:15 PM: -------------------------------------------------------------- Thanks for patch v2, Guozhang. Few review suggestions - 1. How about keeping the unix timestamp as is. All we have to make sure is that it is the equal to what was written. I'm not sure there is an advantage to converting it to some date format. 2. Typo => ephermeral 3. The following log statement is not completely correct - info("I wrote this conflicted ephermeral node a while back in a different session, " + "hence I will backoff for this node to be deleted by Zookeeper after session timeout and retry") The reason is because there are 3 cases when the broker might get NodeExists and the ephemeral node will have the same host and port - 3.1 It ran into one of the recoverable zookeeper errors while creating the ephemeral nodes, in which case ZkClient retried the operation under the covers, and it got a NodeExists error on the 2nd retry. In this case, the timestamp will be useful as it will match what was written and we do not need to retry. 3.2 It hit the zookeeper non-atomic session expiration problem. In this case, the timestamp will not match and we just have to retry. 3.3 The server was killed and restarted within the session timeout. In this case, it is useful to back off for session timeout and retry ephemeral node creation. It will be useful from a logging perspective if we can distinguish between these 3.1 & 3.2/3 cases and retry accordingly. Another way to look at this is to not store the timestamp and just retry on any NodeExists as that has to go through at some point, but we will not get meaningful logging which is not ideal. 4. Regarding the backoff time, I think it is better to backoff for the session timeout 5. Regarding the case where the broker host and port do not match - throw new RuntimeException("A broker is already registered on the path " + brokerIdPath + ". This probably indicates that you either have configured a brokerid that is already in use, or " + "else you have shutdown this broker and restarted it faster than the zookeeper " + "timeout so it appears to be re-registering.") The else part of this statement is incorrect since if you shutdown and restarted the same broker, the broker host and port should in fact match. We should fix the exception message to reflect that another broker [host, port] is registered under that id. was (Author: nehanarkhede): Thanks for patch v2, Guozhang. Few review suggestions - 1. How about keeping the unix timestamp as is. All we have to make sure is that it is the equal to what was written. I'm not sure there is an advantage to converting it to some date format. 2. Typo => ephermeral 3. The following log statement is not completely correct - info("I wrote this conflicted ephermeral node a while back in a different session, " + "hence I will backoff for this node to be deleted by Zookeeper after session timeout and retry") The reason is because there are 2 cases when the broker might get NodeExists and the ephemeral node will have the same host and port - 3.1 It ran into one of the recoverable zookeeper errors while creating the ephemeral nodes, in which case ZkClient retried the operation under the covers, and it got a NodeExists error on the 2nd retry. In this case, the timestamp will be useful as it will match what was written and we do not need to retry. 3.2 It hit the zookeeper non-atomic session expiration problem. In this case, the timestamp will not match and we just have to retry. 3.3 The server was killed and restarted within the session timeout. In this case, it is useful to back off for session timeout and retry ephemeral node creation. It will be useful from a logging perspective if we can distinguish between these 3.1 & 3.2/3 cases and retry accordingly. Another way to look at this is to not store the timestamp and just retry on any NodeExists as that has to go through at some point, but we will not get meaningful logging which is not ideal. 4. Regarding the backoff time, I think it is better to backoff for the session timeout 5. Regarding the case where the broker host and port do not match - throw new RuntimeException("A broker is already registered on the path " + brokerIdPath + ". This probably indicates that you either have configured a brokerid that is already in use, or " + "else you have shutdown this broker and restarted it faster than the zookeeper " + "timeout so it appears to be re-registering.") The else part of this statement is incorrect since if you shutdown and restarted the same broker, the broker host and port should in fact match. We should fix the exception message to reflect that another broker [host, port] is registered under that id. > Double Check on Broker Registration to Avoid False NodeExist Exception > ---------------------------------------------------------------------- > > Key: KAFKA-992 > URL: https://issues.apache.org/jira/browse/KAFKA-992 > Project: Kafka > Issue Type: Bug > Reporter: Neha Narkhede > Assignee: Guozhang Wang > Attachments: KAFKA-992.v1.patch, KAFKA-992.v2.patch > > > The current behavior of zookeeper for ephemeral nodes is that session > expiration and ephemeral node deletion is not an atomic operation. > The side-effect of the above zookeeper behavior in Kafka, for certain corner > cases, is that ephemeral nodes can be lost even if the session is not > expired. The sequence of events that can lead to lossy ephemeral nodes is as > follows - > 1. The session expires on the client, it assumes the ephemeral nodes are > deleted, so it establishes a new session with zookeeper and tries to > re-create the ephemeral nodes. > 2. However, when it tries to re-create the ephemeral node,zookeeper throws > back a NodeExists error code. Now this is legitimate during a session > disconnect event (since zkclient automatically retries the > operation and raises a NodeExists error). Also by design, Kafka server > doesn't have multiple zookeeper clients create the same ephemeral node, so > Kafka server assumes the NodeExists is normal. > 3. However, after a few seconds zookeeper deletes that ephemeral node. So > from the client's perspective, even though the client has a new valid > session, its ephemeral node is gone. > This behavior is triggered due to very long fsync operations on the zookeeper > leader. When the leader wakes up from such a long fsync operation, it has > several sessions to expire. And the time between the session expiration and > the ephemeral node deletion is magnified. Between these 2 operations, a > zookeeper client can issue a ephemeral node creation operation, that could've > appeared to have succeeded, but the leader later deletes the ephemeral node > leading to permanent ephemeral node loss from the client's perspective. > Thread from zookeeper mailing list: > http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira