[
https://issues.apache.org/jira/browse/KAFKA-13173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin McCabe resolved KAFKA-13173.
----------------------------------
Resolution: Fixed
> KRaft controller does not handle simultaneous broker expirations correctly
> --------------------------------------------------------------------------
>
> Key: KAFKA-13173
> URL: https://issues.apache.org/jira/browse/KAFKA-13173
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Gustafson
> Assignee: Niket Goel
> Priority: Blocker
> Fix For: 3.0.0
>
>
> In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current
> stale replicas and attempt to remove them from the ISR. However, when
> multiple expirations occur at once, we do not properly accumulate the ISR
> changes. For example, I ran a test where the ISR of a partition was
> initialized to [1, 2, 3]. Then I triggered a timeout of replicas 2 and 3 at
> the same time. The records that were generated by `fenceStaleBrokers` were
> the following:
> {code}
> ApiMessageAndVersion(PartitionChangeRecord(partitionId=0,
> topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null,
> removingReplicas=null, addingReplicas=null) at version 0),
> ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0),
> ApiMessageAndVersion(PartitionChangeRecord(partitionId=0,
> topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null,
> removingReplicas=null, addingReplicas=null) at version 0),
> ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)]
> {code}
> First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the
> record to fence broker 2. Then the ISR is modified to [1, 2] as the fencing
> of broker 3 is handled. So we did not account for the fact that we had
> already fenced broker 2 in the request.
> A simple solution for now is to change the logic to handle fencing only one
> broker at a time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)