[ 
https://issues.apache.org/jira/browse/KAFKA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543436#comment-16543436
 ] 

ASF GitHub Bot commented on KAFKA-4277:
---------------------------------------

Sh4pe opened a new pull request #5365: Don't throw exceptions in 
KafkaHealthcheck.handleNewSession. Should fix KAFKA-4277
URL: https://github.com/apache/kafka/pull/5365
 
 
   ### Problem we've encountered
   
   We've had a problem recently with Kafka brokers that apparently did not 
work. The last remarkable message we found in our logs was 
[this](https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/utils/ZkUtils.scala#L440)
 Exception, which contained 
[KafkaHealthcheck.register()](https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/server/KafkaHealthcheck.scala#L59)
 in its stack trace.
   
   In our case, the ephemeral Zookeeper node for the broker that 
[handleNewSession](https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/server/KafkaHealthcheck.scala#L119)
 should create apparently still existed - Kafka restarted too quickly after it 
crashed before. The exception caused by the attempt to re-create this Zookeeper 
node eventually causes 
[ZkEventThread.run()](https://github.com/sgroschupf/zkclient/blob/0.9/src/main/java/org/I0Itec/zkclient/ZkEventThread.java#L63)
 to idle, since the Exception is caught there and merely logged. Since no 
further Events occur, the event thread apparently idles.
   
   ### Solution and rationale
   
   In order to have clean well-defined behavior in this case, we've decided 
that it would be best to finish the broker process in such a case. We've 
decided to `sys.exit(1)` and rely on the 
[shutdownHook](https://github.com/apache/kafka/blob/1.0/core/src/main/scala/kafka/Kafka.scala#L88)
 to do a proper shutdown.
   
   This pull request contains the change described above.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> creating ephemeral node already exist
> -------------------------------------
>
>                 Key: KAFKA-4277
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4277
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.0.0
>            Reporter: Feixiang Yan
>            Priority: Major
>
> I use zookeeper 3.4.6.
> Zookeeper session time out, zkClient try reconnect failed. Then re-establish 
> the session and re-registering broker info in ZK, throws NODEEXISTS Exception.
>  I think it is because the ephemeral node which created by old session has 
> not removed. 
> I read the 
> [ZkUtils.scala|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/utils/ZkUtils.scala]
>  of 0.8.1, createEphemeralPathExpectConflictHandleZKBug try create node in a 
> while loop until create success. This can solve the issue. But in 
> [ZkUtils.scala|https://github.com/apache/kafka/blob/0.10.0.1/core/src/main/scala/kafka/utils/ZkUtils.scala]
>   0.10.1 the function removed.
> {noformat}
> [2016-10-07 19:00:32,562] INFO Socket connection established to 
> 10.191.155.238/10.191.155.238:21819, initiating session 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,563] INFO zookeeper state changed (Expired) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-10-07 19:00:32,564] INFO Unable to reconnect to ZooKeeper service, 
> session 0x1576b11f9b201bd has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,564] INFO Initiating client connection, 
> connectString=10.191.155.237:21819,10.191.155.238:21819,10.191.155.239:21819/cluster2
>  sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@ae71be2 
> (org.apache.zookeeper.ZooKeeper)
> [2016-10-07 19:00:32,566] INFO Opening socket connection to server 
> 10.191.155.237/10.191.155.237:21819. Will not attempt to authenticate using 
> SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,566] INFO Socket connection established to 
> 10.191.155.237/10.191.155.237:21819, initiating session 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,566] INFO EventThread shut down 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,567] INFO Session establishment complete on server 
> 10.191.155.237/10.191.155.237:21819, sessionid = 0x1579ecd39c20006, 
> negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,567] INFO zookeeper state changed (SyncConnected) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-10-07 19:00:32,608] INFO re-registering broker info in ZK for broker 3 
> (kafka.server.KafkaHealthcheck$SessionExpireListener)
> [2016-10-07 19:00:32,610] INFO Creating /brokers/ids/3 (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-10-07 19:00:32,611] INFO Result of znode creation is: NODEEXISTS 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-10-07 19:00:32,614] ERROR Error handling event ZkEvent[New session 
> event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@324f1bc] 
> (org.I0Itec.zkclient.ZkEventThread)
> java.lang.RuntimeException: A broker is already registered on the path 
> /brokers/ids/3. This probably indicates that you either have configured a 
> brokerid that is already in use, or else you have shutdown this broker and 
> restarted it faster than the zookeeper timeout so it appears to be 
> re-registering.
>         at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:305)
>         at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:291)
>         at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:70)
>         at 
> kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:104)
>         at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to