Thanks Raman and Colin for your feedback.

Raman wrote:
> - Could you please explain the following about backward compatibility.
> If a leader has been elected unclean. And we decide to roll the
> cluster back when the leader is in the middle of recovery, leader will
> simply not be able to recover when we roll back because it will lose
> its local copy of the unclean flag. Even if the controller sends
> another `LISR` to the leader, it will ignore the flag. So, the leader
> will not be able to carry on its recovery workflow. It will treat the
> situation as if it was not elected unclean and carry on with expanding
> ISR.-

Yes, that is correct. Note that the default value of
"IsLeaderRecovering" in the AlterIsr request is false. That means that
if the controller supports this feature, the validation will succeed
because the size of the ISR is greater than 1 and the
"IsLeaderRecovering" flag is false. If the controller doesn't support
this feature then the ISR will get expanded and the value stored in ZK
will not have the "IsLeaderRecovering".

Raman wrote:
> Is there a specific need for adding "isUnclean '' flag to
> `AlterIsrResponse`. A potential sequence of events will be:
> 1. Controller sets the flag at `ZK` and informs the leader via the
> `LeaderAndIsrRequest`
> 2. Leader undertakes the necessary recovery and sends out a
> `AlterIsrRequest` to controller, with "isUnclean" flag reset
> 3. Controller will reset the flag at `ZK` if the `AlterIsrRequest` goes 
> through.
> Shouldn't the partition level error code at `AlterIsrResponse` be
> enough for the leader to know of a successful update.

I am mainly returning the recorded value for consistency with the
existing fields in the AlterIsrResponse. The implementation on the
broker (AlterIsrManager and Partition) takes advantage of this by
updating it's local state using the response values if the
"CurrentIsrVersion" validates.

Colin wrote:
> "isLeaderRecovering" sounds pretty awkward. If we want to call this "leader 
> recovery" then maybe the flag could be something like "inLeaderRecovery." 
> Actually, how about "inElectionRecovery" to emphasize the fact that we are 
> recovering from an unclean leader election?

I like the name "inElectionRecovery". I'll update the KIP to use this
name. Thanks.

Colin wrote:
> It seems like we both agree that a partition could be stuck in "election 
> recovery" forever if it is running pre-KIP-704 software and there are no 
> available followers to be added. For example, if there are two replicas, and 
> one of them went down and the other was elected uncleanly as leader. Is the 
> argument that being in election recovery forever in this case is not a 
> problem?

You are correct. The partition state can stay in this state forever if
the leader doesn't support this feature, and the replication factor is
1 or the followers never catch up to the leader. I was trying to argue
that this is okay because the leader and future leader will because as
expected.

Colin wrote:
> Can you given an example of a case where a broker would set recovery to true 
> in the AlterIsr RPC? If we can't think of any, then we don't need to add this 
> flag.

That's correct. In the controller I was going to enforce the invariant
that the partition leader is not allowed to change the
"InElectionRecovery'' from false to true. I was going to allow the
broker to send an AlterIsr RPC that doesn't change any state (ISR or
"InElectionRecovery"). I don't think that we should make it implicit
that any AlterIsr RPC that passes the "CurrentIsrVersion" check means
that the leader has recovered from an unclean election.

The current implementation of AlterIsrManager and Partition on the
broker update the leader replica state using the field in the
response. This means that we have to return "InElectionRecovery" in
the response. I think it is awkward for the response to contain this
field but for the request to not contain this field.

Thanks
-José

Reply via email to