[ https://issues.apache.org/jira/browse/KAFKA-13790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Jacot resolved KAFKA-13790. --------------------------------- Fix Version/s: 3.3.0 Reviewer: Jason Gustafson Resolution: Fixed > ReplicaManager should be robust to all partition updates from kraft metadata > log > -------------------------------------------------------------------------------- > > Key: KAFKA-13790 > URL: https://issues.apache.org/jira/browse/KAFKA-13790 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: David Jacot > Priority: Major > Fix For: 3.3.0 > > > There are two ways that partition state can be updated in the zk world: one > is through `LeaderAndIsr` requests and one is through `AlterPartition` > responses. All changes made to partition state result in new LeaderAndIsr > requests, but replicas will ignore them if the leader epoch is less than or > equal to the current known leader epoch. Basically it works like this: > * Changes made by the leader are done through AlterPartition requests. These > changes bump the partition epoch (or zk version), but leave the leader epoch > unchanged. LeaderAndIsr requests are sent by the controller, but replicas > ignore them. Partition state is instead only updated when the AlterIsr > response is received. > * Changes made by the controller are made directly by the controller and > always result in a leader epoch bump. These changes are sent to replicas > through LeaderAndIsr requests and are applied by replicas. > The code in `kafka.server.ReplicaManager` and `kafka.cluster.Partition` are > built on top of these assumptions. The logic in `makeLeader`, for example, > assumes that the leader epoch has indeed been bumped. Specifically, follower > state gets reset and a new entry is written to the leader epoch cache. > In KRaft, we also have two paths to update partition state. One is > AlterPartition, just like in the zk world. The second is updates received > from the metadata log. These follow the same path as LeaderAndIsr requests > for the most part, but a big difference is that all changes are sent down to > `kafka.cluster.Partition`, even those which do not have a bumped leader > epoch. This breaks the assumptions mentioned above in `makeLeader`, which > could result in leader epoch cache inconsistency. Another side effect of this > on the follower side is that replica fetchers for updated partitions get > unnecessarily restarted. There may be others as well. > We need to either replicate the same logic on the zookeeper side or make the > logic robust to all updates including those without a leader epoch bump. -- This message was sent by Atlassian Jira (v8.20.7#820007)