[ https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
NanerLee updated KAFKA-9267: ---------------------------- Reviewer: Manikumar > ZkSecurityMigrator should not create /controller node > ----------------------------------------------------- > > Key: KAFKA-9267 > URL: https://issues.apache.org/jira/browse/KAFKA-9267 > Project: Kafka > Issue Type: Bug > Components: admin > Reporter: NanerLee > Priority: Major > > As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226] > _ZkSecurityMigrator_ checks and sets acl recursively for each path in > _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_. > As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create > _/controller_ node if _/controller_ is not existed. > _/controller_ is a *EPHEMERAL* node for controller election, but > _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* > data. > If that happens, null data will cause a *NPE*, and the controller cannot be > elected, kafka cluster will be unavailable . > In addition, a *PERSISTENT* node doesn't disappear automatically, we have to > delete it manually to fix the problem. > > *PERSISTENT* _/controller_ node with *null* data in zk: > {code:java} > [zk: localhost:2181(CONNECTED) 16] get /kafka/controller > null > cZxid = 0x1100002284 > ctime = Tue Dec 03 18:37:26 CST 2019 > mZxid = 0x1100002284 > mtime = Tue Dec 03 18:37:26 CST 2019 > pZxid = 0x1100002284 > cversion = 0 > dataVersion = 0 > aclVersion = 1 > ephemeralOwner = 0x0 > dataLength = 0 > numChildren = 0{code} > *Normal* /controller node in zk: > {code:java} > [zk: localhost:2181(CONNECTED) 21] get /kafka/controller > {"version":1,"brokerid":1001,"timestamp":"1575370170528"} > cZxid = 0x11000023e1 > ctime = Tue Dec 03 18:49:30 CST 2019 > mZxid = 0x11000023e1 > mtime = Tue Dec 03 18:49:30 CST 2019 > pZxid = 0x11000023e1 > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x16ecb572df50021 > dataLength = 57 > numChildren = 0{code} > *NPE* in controller.log : > {code:java} > [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] > Starting (kafka.controller.ControllerEventManager$ControllerEventThread) > [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] > Error processing event Startup > (kafka.controller.ControllerEventManager$ControllerEventThread) > java.lang.NullPointerException > at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857) > at > com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572) > at kafka.utils.Json$.parseBytes(Json.scala:62) > at kafka.zk.ControllerZNode$.decode(ZkData.scala:56) > at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902) > at > kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199) > at > kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) > at > kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code} > > So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ > node when _/controller_ is not existed. > This bug seems to affect all versions, please review and merge the PR as soon > as possible. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)