Hi José, Thanks for the KIP.
I have one question regarding how fetch from followers will work when the leader is recovering. My understanding is that the leader will reject any produce and fetch requests with a NOT_LEADER_OR_FOLLOWER error while the followers will fence any fetch requests based on the incremented leader epoch. That seems OK for recent consumers from a correctness perspective but it might be a little weird for older consumers which do not set the leader epoch in the fetch request (prior to v9). They would be able to fetch from the followers while the leader recovers if I understand it correctly. It might be good to clarify this case in the KIP. What do you think? Best, David On Fri, Jan 28, 2022 at 7:18 PM José Armando García Sancio <jsan...@confluent.io.invalid> wrote: > > Hi all, > > Jason and I discussed this offline. At a high-level I have made the > following changes to the KIP. > > 1. IBP will be used to enable this feature and to determine which > version of LeaderAndIsr and AlterPartition will be used. > 2. The LeaderRecoveryState field for LeaderAndIsr and AlterPartition > is not marked as ignorable. > > If the controller sees an AlterPartition is a version of 0, it will > assume that the leader has recovered. > > If the leader gets a RECOVERING for the LeaderRecoveryState it will > attempt to recover the partition irrespective of the IBP. When it has > recovered, depending on the IBP it will send the right version of > AlterPartition. > > KIP Diff: > https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=173082256&selectedPageVersions=19&selectedPageVersions=18 > > Thanks, > -José