[ 
https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433768#comment-15433768
 ] 

Jun Rao commented on KAFKA-3083:
--------------------------------

Someone encountered another issue related to this. After a broker's ZK session 
expires and it resigns as the controller, there is the following error in the 
controller log.

2016-08-13 17:34:23,721 ERROR org.I0Itec.zkclient.ZkEventThread:77 
[ZkClient-EventThread-87- [run] Error handling event ZkEvent[Children of 
/isr_change_notification changed sent to 
kafka.controller.IsrChangeNotificationListener@3c60b0b1] 
java.lang.IllegalStateException: java.lang.NullPointerException 
at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:435)
 
at 
kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:1029)
 
at 
kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications(KafkaController.scala:1372)
 
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp(KafkaController.scala:1359)
 
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
 
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply(KafkaController.scala:1352)
 
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) 
at 
kafka.controller.IsrChangeNotificationListener.handleChildChange(KafkaController.scala:1352)
 
at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) 
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 
Caused by: java.lang.NullPointerException 
at kafka.controller.KafkaController.sendRequest(KafkaController.scala:699) 
at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:404)
 
at 
kafka.controller.ControllerBrokerRequestBatch$$anonfun$sendRequestsToBrokers$2.apply(ControllerChannelManager.scala:370)
 
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) 
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) 
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) 
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) 
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) 
at 
kafka.controller.ControllerBrokerRequestBatch.sendRequestsToBrokers(ControllerChannelManager.scala:370)
 
... 9 more

The broker fails to send an UpdateMetadataRequest in react to an ISR change 
event since controllerChannelManager is null after the broker resigns as the 
controller. When this happen, the broker calls the logic to force a controller 
to resign. This could accidentally delete the controller path created by 
another broker.

2016-08-13 17:34:23,639 ERROR kafka.utils.Logging$class:97 
[ZkClient-EventThread-87-] [error] [Controller 43]: Forcing the controller to 
resign



> a soft failure in controller may leave a topic partition in an inconsistent 
> state
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-3083
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3083
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>            Reporter: Jun Rao
>            Assignee: Mayuresh Gharat
>
> The following sequence can happen.
> 1. Broker A is the controller and is in the middle of processing a broker 
> change event. As part of this process, let's say it's about to shrink the isr 
> of a partition.
> 2. Then broker A's session expires and broker B takes over as the new 
> controller. Broker B sends the initial leaderAndIsr request to all brokers.
> 3. Broker A continues by shrinking the isr of the partition in ZK and sends 
> the new leaderAndIsr request to the broker (say C) that leads the partition. 
> Broker C will reject this leaderAndIsr since the request comes from a 
> controller with an older epoch. Now we could be in a situation that Broker C 
> thinks the isr has all replicas, but the isr stored in ZK is different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to