[ 
https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987978#comment-16987978
 ] 

ASF GitHub Bot commented on KAFKA-9267:
---------------------------------------

NanerLee commented on pull request #7778: KAFKA-9267: ZkSecurityMigrator should 
not create /controller node
URL: https://github.com/apache/kafka/pull/7778
 
 
   [KAFKA-9267](https://issues.apache.org/jira/browse/KAFKA-9267)
   
   ZkSecurityMigrator might create a PERSISTENT /controller node with null 
data, it will lead to controller can't elect.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ZkSecurityMigrator should not create /controller node
> -----------------------------------------------------
>
>                 Key: KAFKA-9267
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9267
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin
>            Reporter: NanerLee
>            Priority: Major
>
> As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]]
> _ZkSecurityMigrator_ checks and sets acl recursively for each path in 
> _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
> As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create 
> _/controller_ node if _/controller_ is not existed.
> _/controller_ is a *EPHEMERAL* node for controller election, but 
> _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* 
> data.
> If that happens, null data will cause a *NPE*, and the controller cannot be 
> elected, kafka cluster will be unavailable .
>  In addition, a *PERSISTENT* node doesn't disappear automatically, we have to 
> delete it manually to fix the problem.
>  
> *PERSISTENT* _/controller_ node with *null* data in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
> null
> cZxid = 0x1100002284
> ctime = Tue Dec 03 18:37:26 CST 2019
> mZxid = 0x1100002284
> mtime = Tue Dec 03 18:37:26 CST 2019
> pZxid = 0x1100002284
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0{code}
> *Normal* /controller node in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
> {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
> cZxid = 0x11000023e1
> ctime = Tue Dec 03 18:49:30 CST 2019
> mZxid = 0x11000023e1
> mtime = Tue Dec 03 18:49:30 CST 2019
> pZxid = 0x11000023e1
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x16ecb572df50021
> dataLength = 57
> numChildren = 0{code}
>  *NPE* in controller.log : 
> {code:java}
> [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] 
> Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
> [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] 
> Error processing event Startup 
> (kafka.controller.ControllerEventManager$ControllerEventThread)
> java.lang.NullPointerException
>  at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
>  at 
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
>  at kafka.utils.Json$.parseBytes(Json.scala:62)
>  at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
>  at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
>  at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
>  at 
> kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
>  
> So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ 
> node when _/controller_ is not existed.
> This bug seems to affect all versions, please review and merge the PR as soon 
> as possible.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to