[ https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989590#comment-16989590 ]
ASF GitHub Bot commented on KAFKA-9267: --------------------------------------- omkreddy commented on pull request #7778: KAFKA-9267: ZkSecurityMigrator should not create /controller node URL: https://github.com/apache/kafka/pull/7778 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ZkSecurityMigrator should not create /controller node > ----------------------------------------------------- > > Key: KAFKA-9267 > URL: https://issues.apache.org/jira/browse/KAFKA-9267 > Project: Kafka > Issue Type: Bug > Components: admin > Reporter: NanerLee > Priority: Major > > As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226] > _ZkSecurityMigrator_ checks and sets acl recursively for each path in > _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_. > As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create > _/controller_ node if _/controller_ is not existed. > _/controller_ is a *EPHEMERAL* node for controller election, but > _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* > data. > If that happens, null data will cause a *NPE*, and the controller cannot be > elected, kafka cluster will be unavailable . > In addition, a *PERSISTENT* node doesn't disappear automatically, we have to > delete it manually to fix the problem. > > *PERSISTENT* _/controller_ node with *null* data in zk: > {code:java} > [zk: localhost:2181(CONNECTED) 16] get /kafka/controller > null > cZxid = 0x1100002284 > ctime = Tue Dec 03 18:37:26 CST 2019 > mZxid = 0x1100002284 > mtime = Tue Dec 03 18:37:26 CST 2019 > pZxid = 0x1100002284 > cversion = 0 > dataVersion = 0 > aclVersion = 1 > ephemeralOwner = 0x0 > dataLength = 0 > numChildren = 0{code} > *Normal* /controller node in zk: > {code:java} > [zk: localhost:2181(CONNECTED) 21] get /kafka/controller > {"version":1,"brokerid":1001,"timestamp":"1575370170528"} > cZxid = 0x11000023e1 > ctime = Tue Dec 03 18:49:30 CST 2019 > mZxid = 0x11000023e1 > mtime = Tue Dec 03 18:49:30 CST 2019 > pZxid = 0x11000023e1 > cversion = 0 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x16ecb572df50021 > dataLength = 57 > numChildren = 0{code} > *NPE* in controller.log : > {code:java} > [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] > Starting (kafka.controller.ControllerEventManager$ControllerEventThread) > [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] > Error processing event Startup > (kafka.controller.ControllerEventManager$ControllerEventThread) > java.lang.NullPointerException > at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857) > at > com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572) > at kafka.utils.Json$.parseBytes(Json.scala:62) > at kafka.zk.ControllerZNode$.decode(ZkData.scala:56) > at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902) > at > kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199) > at > kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86) > at > kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86) > at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) > at > kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code} > > So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ > node when _/controller_ is not existed. > This bug seems to affect all versions, please review and merge the PR as soon > as possible. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)