Thanks Raman and Colin for your feedback. Raman wrote: > - Could you please explain the following about backward compatibility. > If a leader has been elected unclean. And we decide to roll the > cluster back when the leader is in the middle of recovery, leader will > simply not be able to recover when we roll back because it will lose > its local copy of the unclean flag. Even if the controller sends > another `LISR` to the leader, it will ignore the flag. So, the leader > will not be able to carry on its recovery workflow. It will treat the > situation as if it was not elected unclean and carry on with expanding > ISR.-
Yes, that is correct. Note that the default value of "IsLeaderRecovering" in the AlterIsr request is false. That means that if the controller supports this feature, the validation will succeed because the size of the ISR is greater than 1 and the "IsLeaderRecovering" flag is false. If the controller doesn't support this feature then the ISR will get expanded and the value stored in ZK will not have the "IsLeaderRecovering". Raman wrote: > Is there a specific need for adding "isUnclean '' flag to > `AlterIsrResponse`. A potential sequence of events will be: > 1. Controller sets the flag at `ZK` and informs the leader via the > `LeaderAndIsrRequest` > 2. Leader undertakes the necessary recovery and sends out a > `AlterIsrRequest` to controller, with "isUnclean" flag reset > 3. Controller will reset the flag at `ZK` if the `AlterIsrRequest` goes > through. > Shouldn't the partition level error code at `AlterIsrResponse` be > enough for the leader to know of a successful update. I am mainly returning the recorded value for consistency with the existing fields in the AlterIsrResponse. The implementation on the broker (AlterIsrManager and Partition) takes advantage of this by updating it's local state using the response values if the "CurrentIsrVersion" validates. Colin wrote: > "isLeaderRecovering" sounds pretty awkward. If we want to call this "leader > recovery" then maybe the flag could be something like "inLeaderRecovery." > Actually, how about "inElectionRecovery" to emphasize the fact that we are > recovering from an unclean leader election? I like the name "inElectionRecovery". I'll update the KIP to use this name. Thanks. Colin wrote: > It seems like we both agree that a partition could be stuck in "election > recovery" forever if it is running pre-KIP-704 software and there are no > available followers to be added. For example, if there are two replicas, and > one of them went down and the other was elected uncleanly as leader. Is the > argument that being in election recovery forever in this case is not a > problem? You are correct. The partition state can stay in this state forever if the leader doesn't support this feature, and the replication factor is 1 or the followers never catch up to the leader. I was trying to argue that this is okay because the leader and future leader will because as expected. Colin wrote: > Can you given an example of a case where a broker would set recovery to true > in the AlterIsr RPC? If we can't think of any, then we don't need to add this > flag. That's correct. In the controller I was going to enforce the invariant that the partition leader is not allowed to change the "InElectionRecovery'' from false to true. I was going to allow the broker to send an AlterIsr RPC that doesn't change any state (ISR or "InElectionRecovery"). I don't think that we should make it implicit that any AlterIsr RPC that passes the "CurrentIsrVersion" check means that the leader has recovered from an unclean election. The current implementation of AlterIsrManager and Partition on the broker update the leader replica state using the field in the response. This means that we have to return "InElectionRecovery" in the response. I think it is awkward for the response to contain this field but for the request to not contain this field. Thanks -José