[
https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
NanerLee updated KAFKA-9267:
----------------------------
External issue URL: https://github.com/apache/kafka/pull/7778
Description:
As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]
_ZkSecurityMigrator_ checks and sets acl recursively for each path in
_SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create
_/controller_ node if _/controller_ is not existed.
_/controller_ is a *EPHEMERAL* node for controller election, but
_makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null*
data.
If that happens, null data will cause a *NPE*, and the controller cannot be
elected, kafka cluster will be unavailable .
In addition, a *PERSISTENT* node doesn't disappear automatically, we have to
delete it manually to fix the problem.
*PERSISTENT* _/controller_ node with *null* data in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 16] get /kafka/controller
null
cZxid = 0x1100002284
ctime = Tue Dec 03 18:37:26 CST 2019
mZxid = 0x1100002284
mtime = Tue Dec 03 18:37:26 CST 2019
pZxid = 0x1100002284
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0{code}
*Normal* /controller node in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 21] get /kafka/controller
{"version":1,"brokerid":1001,"timestamp":"1575370170528"}
cZxid = 0x11000023e1
ctime = Tue Dec 03 18:49:30 CST 2019
mZxid = 0x11000023e1
mtime = Tue Dec 03 18:49:30 CST 2019
pZxid = 0x11000023e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ecb572df50021
dataLength = 57
numChildren = 0{code}
*NPE* in controller.log :
{code:java}
[2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002]
Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error
processing event Startup
(kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
at kafka.utils.Json$.parseBytes(Json.scala:62)
at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
at
kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
at
kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node
when _/controller_ is not existed.
This bug seems to affect all versions, please review and merge the PR as soon
as possible.
Thanks!
was:
As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]]
_ZkSecurityMigrator_ checks and sets acl recursively for each path in
_SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create
_/controller_ node if _/controller_ is not existed.
_/controller_ is a *EPHEMERAL* node for controller election, but
_makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null*
data.
If that happens, null data will cause a *NPE*, and the controller cannot be
elected, kafka cluster will be unavailable .
In addition, a *PERSISTENT* node doesn't disappear automatically, we have to
delete it manually to fix the problem.
*PERSISTENT* _/controller_ node with *null* data in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 16] get /kafka/controller
null
cZxid = 0x1100002284
ctime = Tue Dec 03 18:37:26 CST 2019
mZxid = 0x1100002284
mtime = Tue Dec 03 18:37:26 CST 2019
pZxid = 0x1100002284
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0{code}
*Normal* /controller node in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 21] get /kafka/controller
{"version":1,"brokerid":1001,"timestamp":"1575370170528"}
cZxid = 0x11000023e1
ctime = Tue Dec 03 18:49:30 CST 2019
mZxid = 0x11000023e1
mtime = Tue Dec 03 18:49:30 CST 2019
pZxid = 0x11000023e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ecb572df50021
dataLength = 57
numChildren = 0{code}
*NPE* in controller.log :
{code:java}
[2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002]
Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error
processing event Startup
(kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
at kafka.utils.Json$.parseBytes(Json.scala:62)
at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
at
kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
at
kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at
kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
at
kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node
when _/controller_ is not existed.
This bug seems to affect all versions, please review and merge the PR as soon
as possible.
Thanks!
> ZkSecurityMigrator should not create /controller node
> -----------------------------------------------------
>
> Key: KAFKA-9267
> URL: https://issues.apache.org/jira/browse/KAFKA-9267
> Project: Kafka
> Issue Type: Bug
> Components: admin
> Reporter: NanerLee
> Priority: Major
>
> As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]
> _ZkSecurityMigrator_ checks and sets acl recursively for each path in
> _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
> As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create
> _/controller_ node if _/controller_ is not existed.
> _/controller_ is a *EPHEMERAL* node for controller election, but
> _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null*
> data.
> If that happens, null data will cause a *NPE*, and the controller cannot be
> elected, kafka cluster will be unavailable .
> In addition, a *PERSISTENT* node doesn't disappear automatically, we have to
> delete it manually to fix the problem.
>
> *PERSISTENT* _/controller_ node with *null* data in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
> null
> cZxid = 0x1100002284
> ctime = Tue Dec 03 18:37:26 CST 2019
> mZxid = 0x1100002284
> mtime = Tue Dec 03 18:37:26 CST 2019
> pZxid = 0x1100002284
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0{code}
> *Normal* /controller node in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
> {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
> cZxid = 0x11000023e1
> ctime = Tue Dec 03 18:49:30 CST 2019
> mZxid = 0x11000023e1
> mtime = Tue Dec 03 18:49:30 CST 2019
> pZxid = 0x11000023e1
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x16ecb572df50021
> dataLength = 57
> numChildren = 0{code}
> *NPE* in controller.log :
> {code:java}
> [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002]
> Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
> [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002]
> Error processing event Startup
> (kafka.controller.ControllerEventManager$ControllerEventThread)
> java.lang.NullPointerException
> at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
> at
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
> at kafka.utils.Json$.parseBytes(Json.scala:62)
> at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
> at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
> at
> kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
> at
> kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
> at
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
> at
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> at
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> at
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
>
> So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_
> node when _/controller_ is not existed.
> This bug seems to affect all versions, please review and merge the PR as soon
> as possible.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)