[ 
https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NanerLee updated KAFKA-9267:
----------------------------
    Description: 
As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]]

_ZkSecurityMigrator_ checks and sets acl recursively for each path in 
_SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.

As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create 
_/controller_ node if _/controller_ is not existed.

_/controller_ is a *EPHEMERAL* node for controller election, but 
_makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* 
data.

If that happens, null data will cause a *NPE*, and the controller cannot be 
elected, kafka cluster will be unavailable .
 In addition, a *PERSISTENT* node doesn't disappear automatically, we have to 
delete it manually to fix the problem.

 

*PERSISTENT* _/controller_ node with *null* data in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 16] get /kafka/controller
null
cZxid = 0x1100002284
ctime = Tue Dec 03 18:37:26 CST 2019
mZxid = 0x1100002284
mtime = Tue Dec 03 18:37:26 CST 2019
pZxid = 0x1100002284
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0{code}
*Normal* /controller node in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 21] get /kafka/controller
{"version":1,"brokerid":1001,"timestamp":"1575370170528"}
cZxid = 0x11000023e1
ctime = Tue Dec 03 18:49:30 CST 2019
mZxid = 0x11000023e1
mtime = Tue Dec 03 18:49:30 CST 2019
pZxid = 0x11000023e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ecb572df50021
dataLength = 57
numChildren = 0{code}
 *NPE* in controller.log : 
{code:java}
[2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] 
Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error 
processing event Startup 
(kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
 at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
 at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
 at kafka.utils.Json$.parseBytes(Json.scala:62)
 at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
 at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
 at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
 at 
kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
 

So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node 
when _/controller_ is not existed.

This bug seems to affect all versions, please review and merge the PR as soon 
as possible.

Thanks!

  was:
As we can see in these source codes – 
[ZkSecurityMigrator.scala#L226|[https://github.com/apache/kafka/blob/2accf14ccf9b1f96c9dd8cfb94530c56378fae80/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L226]|https://github.com/apache/kafka/blob/2accf14ccf9b1f96c9dd8cfb94530c56378fae80/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L226]).]

_ZkSecurityMigrator_ checks and sets acl recursively for each path in 
_SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create 
_/controller_ node if _/controller_ is not existed.

_/controller_ is a *EPHEMERAL* node for controller election, but 
_makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* 
data.

 

If that happens, null data will cause a *NPE*, and the controller cannot be 
elected, kafka cluster will be unavailable .
In addition, a *PERSISTENT* node doesn't disappear automatically, we have to 
delete it manually to fix the problem.


*PERSISTENT* _/controller_ node with *null* data in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 16] get /kafka/controller
null
cZxid = 0x1100002284
ctime = Tue Dec 03 18:37:26 CST 2019
mZxid = 0x1100002284
mtime = Tue Dec 03 18:37:26 CST 2019
pZxid = 0x1100002284
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0{code}
 

*Normal* /controller node in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 21] get /kafka/controller
{"version":1,"brokerid":1001,"timestamp":"1575370170528"}
cZxid = 0x11000023e1
ctime = Tue Dec 03 18:49:30 CST 2019
mZxid = 0x11000023e1
mtime = Tue Dec 03 18:49:30 CST 2019
pZxid = 0x11000023e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ecb572df50021
dataLength = 57
numChildren = 0{code}
 

*NPE* in controller.log :

 
{code:java}
[2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] 
Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error 
processing event Startup 
(kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
 at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
 at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
 at kafka.utils.Json$.parseBytes(Json.scala:62)
 at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
 at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
 at 
kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
 at 
kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
 at 
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
 


So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node 
when _/controller_ is not existed.

This bug seems to affect all versions, please review and merge the PR as soon 
as possible.

Thanks!


> ZkSecurityMigrator should not create /controller node
> -----------------------------------------------------
>
>                 Key: KAFKA-9267
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9267
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin
>            Reporter: NanerLee
>            Priority: Major
>
> As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]]
> _ZkSecurityMigrator_ checks and sets acl recursively for each path in 
> _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
> As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create 
> _/controller_ node if _/controller_ is not existed.
> _/controller_ is a *EPHEMERAL* node for controller election, but 
> _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* 
> data.
> If that happens, null data will cause a *NPE*, and the controller cannot be 
> elected, kafka cluster will be unavailable .
>  In addition, a *PERSISTENT* node doesn't disappear automatically, we have to 
> delete it manually to fix the problem.
>  
> *PERSISTENT* _/controller_ node with *null* data in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
> null
> cZxid = 0x1100002284
> ctime = Tue Dec 03 18:37:26 CST 2019
> mZxid = 0x1100002284
> mtime = Tue Dec 03 18:37:26 CST 2019
> pZxid = 0x1100002284
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0{code}
> *Normal* /controller node in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
> {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
> cZxid = 0x11000023e1
> ctime = Tue Dec 03 18:49:30 CST 2019
> mZxid = 0x11000023e1
> mtime = Tue Dec 03 18:49:30 CST 2019
> pZxid = 0x11000023e1
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x16ecb572df50021
> dataLength = 57
> numChildren = 0{code}
>  *NPE* in controller.log : 
> {code:java}
> [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] 
> Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
> [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] 
> Error processing event Startup 
> (kafka.controller.ControllerEventManager$ControllerEventThread)
> java.lang.NullPointerException
>  at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
>  at 
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
>  at kafka.utils.Json$.parseBytes(Json.scala:62)
>  at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
>  at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
>  at 
> kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
>  at 
> kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
>  at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
>  
> So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ 
> node when _/controller_ is not existed.
> This bug seems to affect all versions, please review and merge the PR as soon 
> as possible.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to