Badai Aqrandista created KAFKA-12163: ----------------------------------------
Summary: Controller should ensure zkVersion is monotonically increasing when sending UpdateMetadata requests. Key: KAFKA-12163 URL: https://issues.apache.org/jira/browse/KAFKA-12163 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: Badai Aqrandista When sending UpdateMetadata requests, controller does not currently perform any check to ensure zkVersion is monotonically increasing. If Zookeeper gets into a bad state, this can cause Kafka cluster to get into a bad state and possible data loss as well. Controller should perform a check to protect the Kafka clusters from getting into a bad state. Following shows an example of zkVersion going backward at 2020-12-08 14:10:46,420. {noformat} [2020-11-23 00:56:20,315] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=195, leader=2152, leaderEpoch=210, isr=[2154, 2152, 1153, 1152], zkVersion=535, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-11-23 01:15:28,449] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=195, leader=2152, leaderEpoch=210, isr=[2154, 2152, 1153, 1152], zkVersion=535, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-11-24 00:15:17,042] TRACE [Controller id=1153 epoch=196] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=196, leader=2152, leaderEpoch=211, isr=[2154, 2152, 1152], zkVersion=536, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[1153]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 21:53:14,887] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152]) to brokers Set(2153, 1152, 2154, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:11:43,739] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2152) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:11:43,815] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:12:12,602] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2154, leaderEpoch=212, isr=[2154, 1152, 1153, 2152], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-06 22:12:17,019] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=213, isr=[2154, 1152, 1153, 2152], zkVersion=540, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:08:46,077] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152], zkVersion=541, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2154]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:08:54,790] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152], zkVersion=541, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2154]) to brokers Set(2153, 1152, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:27:26,764] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152], zkVersion=541, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2154) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:27:26,840] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152], zkVersion=541, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-07 00:27:55,940] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=2152, leaderEpoch=214, isr=[1152, 1153, 2152, 2154], zkVersion=542, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[]) to brokers Set(2153, 1152, 2154, 2151, 1153, 2152, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 14:10:46,420] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1152, leaderEpoch=212, isr=[1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 14:10:51,011] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1152, leaderEpoch=212, isr=[1152, 1153], zkVersion=538, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:06:22,884] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154, 1152]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:06:29,501] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154, 1152]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:06:34,796] TRACE [Controller id=1152 epoch=197] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154, 1152]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:06:53,540] TRACE [Controller id=2151 epoch=198] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=197, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154, 1152]) to brokers Set(2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:06:55,346] TRACE [Controller id=2151 epoch=198] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=198, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154, 1152]) to brokers Set(2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:15:14,636] TRACE [Controller id=2151 epoch=198] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=198, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154]) to brokers Set(1152) for partition Execution_CustomsStatus-6 (state.change.logger) [2020-12-08 15:15:14,716] TRACE [Controller id=2151 epoch=198] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='Execution_CustomsStatus', partitionIndex=6, controllerEpoch=198, leader=1153, leaderEpoch=213, isr=[1153], zkVersion=539, replicas=[2152, 2154, 1152, 1153], offlineReplicas=[2152, 2154]) to brokers Set(1152, 2151, 1153, 1154, 1151) for partition Execution_CustomsStatus-6 (state.change.logger) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)