[
https://issues.apache.org/jira/browse/KAFKA-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171849#comment-16171849
]
Mickael Maison commented on KAFKA-5885:
---------------------------------------
So we hit this issue in July one night and the ops engineer ended up deleting
the znode for the affected topics. Unfortunately, at the time we didn't capture
a full backup of what was in zookeeper. From the stack trace, we understand
that somehow a topic znode had "null" as data but we don't know if it still had
its child nodes or not. [~cotedm] do you happen to know ?
We reviewed many code paths in Kafka while trying to find where a null could be
written and we came up with one potential suspect:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/PartitionStateMachine.scala#L244
If the topic doesn't exist at all in zookeeper, createPersistentPath() will
create all parent nodes "/broker/topics/TOPIC/partitions/PARTITION/state"
inserting null as the znodes payload. At the time of the issue, a lot of topics
were being created and immediately deleted on our cluster.
As you suggested, having better error messages would help but also I think we
need to understand if this can still happen in the latest release.
> NPE in ZKClient
> ---------------
>
> Key: KAFKA-5885
> URL: https://issues.apache.org/jira/browse/KAFKA-5885
> Project: Kafka
> Issue Type: Bug
> Components: zkclient
> Affects Versions: 0.10.2.1
> Reporter: Dustin Cote
>
> A null znode for a topic (reason how this happen isn't totally clear, but not
> the focus of this issue) can currently cause controller leader election to
> fail. When looking at the broker logging, you can see there is a
> NullPointerException emanating from the ZKClient:
> {code}
> [2017-09-11 00:00:21,441] ERROR Error while electing or becoming leader on
> broker 1010674 (kafka.server.ZookeeperLeaderElector)
> kafka.common.KafkaException: Can't parse json string: null
> at kafka.utils.Json$.liftedTree1$1(Json.scala:40)
> at kafka.utils.Json$.parseFull(Json.scala:36)
> at
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:704)
> at
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$1.apply(ZkUtils.scala:700)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at
> kafka.utils.ZkUtils.getReplicaAssignmentForTopics(ZkUtils.scala:700)
> at
> kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:742)
> at
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:333)
> at
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:160)
> at
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:85)
> at
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:154)
> at
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154)
> at
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:154)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
> at
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:153)
> at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:825)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
> Caused by: java.lang.NullPointerException
> {code}
> Regardless of how a null topic znode ended up in ZooKeeper, we can probably
> handle this better, at least by printing the path up to the problematic znode
> in the log. The way this particular problem was resolved was by using the
> ``kafka-topics`` command and seeing it persistently fail trying to read a
> particular topic with this same message. Then deleting the null znode allowed
> the leader election to complete.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)