Hi José, Thanks for the changes.
"isLeaderRecovering" sounds pretty awkward. If we want to call this "leader recovery" then maybe the flag could be something like "inLeaderRecovery." Actually, how about "inElectionRecovery" to emphasize the fact that we are recovering from an unclean leader election? > At a high-level this change is backwards compatible because the default > value for all of the "is leader recovering" field in the protocol is > "false". When thinking about backward compatibility it is important to > note that if the "is leader recovering" field is true then the ISR is > guarantee to have a size of 1. The topic partition leader will not > increase the ISR until it has recovered from the unclean leader election > and has set the "is leader recovering" field to false. It seems like we both agree that a partition could be stuck in "election recovery" forever if it is running pre-KIP-704 software and there are no available followers to be added. For example, if there are two replicas, and one of them went down and the other was elected uncleanly as leader. Is the argument that being in election recovery forever in this case is not a problem? Can you given an example of a case where a broker would set recovery to true in the AlterIsr RPC? If we can't think of any, then we don't need to add this flag. best, Colin On Wed, Jan 19, 2022, at 16:52, José Armando García Sancio wrote: > Hi all, > > I made the following changes to the KIP: > https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=173082256&selectedPageVersions=12&selectedPageVersions=11 > > Some of the highlights are: > 1. Changed the field from IsUnclean to IsLeaderRecovering > 2. Added a few more sentences explaining why this KIP is backward > compatible and the interaction between the controller and the > partition leaders when they are in different software versions. > > Thanks > -José