[
https://issues.apache.org/jira/browse/KAFKA-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547157#comment-17547157
]
Jose Armando Garcia Sancio commented on KAFKA-13944:
----------------------------------------------------
Looks like this issue is addressed by
https://issues.apache.org/jira/browse/KAFKA-13916
> Shutting down broker can be elected as partition leader in KRaft
> ----------------------------------------------------------------
>
> Key: KAFKA-13944
> URL: https://issues.apache.org/jira/browse/KAFKA-13944
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Gustafson
> Assignee: Jose Armando Garcia Sancio
> Priority: Major
> Labels: kip-500
>
> When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN
> state in the controller. It is possible for the broker to remain unfenced in
> this state until the controlled shutdown completes. When doing an election,
> the only thing we generally check is that the broker is unfenced, so this
> means we can elect a broker that is in controlled shutdown.
> Here are a few snippets from a recent system test in which this occurred:
> {code:java}
> // broker 2 starts controlled shutdown
> [2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has
> requested and been granted a controlled shutdown.
> (org.apache.kafka.controller.BrokerHeartbeatManager)
>
> // there is only one replica, so we set leader to -1
> [2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1,
> partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
> // controlled shutdown cannot complete immediately
> [2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2
> to shut down can not yet be granted because the lowest active offset 177 is
> not greater than the broker's shutdown offset 244.
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> [2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled
> shutdown offset for broker 2 to 244.
> (org.apache.kafka.controller.BrokerHeartbeatManager)
> // later on we elect leader 2 again
> [2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1
> with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2,
> partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)
> // now controlled shutdown is stuck because of the newly elected leader
> [2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled
> shutdown state, but can not shut down because more leaders still need to be
> moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)