Hi José,

Thanks for the changes.

"isLeaderRecovering" sounds pretty awkward. If we want to call this "leader 
recovery" then maybe the flag could be something like "inLeaderRecovery." 
Actually, how about "inElectionRecovery" to emphasize the fact that we are 
recovering from an unclean leader election?

 > At a high-level this change is backwards compatible because the default 
 > value for all of the "is leader recovering" field in the protocol is 
 > "false". When thinking about backward compatibility it is important to 
 > note that if the "is leader recovering" field is true then the ISR is 
 > guarantee to have a size of 1. The topic partition leader will not 
 > increase the ISR until it has recovered from the unclean leader election 
 > and has set the "is leader recovering" field to false.

It seems like we both agree that a partition could be stuck in "election 
recovery" forever if it is running pre-KIP-704 software and there are no 
available followers to be added. For example, if there are two replicas, and 
one of them went down and the other was elected uncleanly as leader. Is the 
argument that being in election recovery forever in this case is not a problem?

Can you given an example of a case where a broker would set recovery to true in 
the AlterIsr RPC? If we can't think of any, then we don't need to add this flag.

best,
Colin

On Wed, Jan 19, 2022, at 16:52, José Armando García Sancio wrote:
> Hi all,
>
> I made the following changes to the KIP:
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=173082256&selectedPageVersions=12&selectedPageVersions=11
>
> Some of the highlights are:
> 1. Changed the field from IsUnclean to IsLeaderRecovering
> 2. Added a few more sentences explaining why this KIP is backward
> compatible and the interaction between the controller and the
> partition leaders when they are in different software versions.
>
> Thanks
> -José

Reply via email to