Jose, thanks for the KIP!

1) Does recovering from an unclean state bump the leader epoch?

2) The name of "NewIsUnclean" field in AlterIsrRequest is a little strange.
>From the description, it sounds like this will be used to by the broker to
indicate to the controller that it has recovered from unclean leader
election. If that's the case, maybe something like "RecoverFromUnclean"
would be better?

3) Will followers try to fetch from an unclean leader? Or will they wait
for a LISR with unclean=false (or a PartitionChangeRecord with
unclean=false)?

4) Is there any other way for a partition to recover from an unclean state
other than the leader sending AISR with NewIsUnclean=false? Is it possible
for a leader to fail the recovery process? Are further unclean election
attempts by a user blocked until we recover?

General comment on naming, maybe we can go with something like
"UncleanElection" or "UncleanEpoch" instead of "IsUnclean".

Thanks!
David

On Tue, Jan 11, 2022 at 4:31 PM Jason Gustafson <ja...@confluent.io.invalid>
wrote:

> Hi Jose,
>
> Thanks for the KIP. Just a minor question about this:
>
> > This means that the leader will not allow followers to join the ISR until
> it has recovered from the unclean leader election.
>
> If I understand correctly, the main reason for this is to avoid the need to
> propagate the "IsUnclean" flag between elections. It ensures that we cannot
> have a "clean" election until the recovery has completed. On the other
> hand, if we need to do another unclean election because the recovering
> leader failed, then we would get the "IsUnclean" flag naturally. Are there
> any additional limitations we should consider while the unclean leader is
> recovering? For example, should we not allow consumers to read from the
> partition until the recovery has completed as well?
>
> By the way, I do find the naming of the "IsUnclean" field a tad awkward.
> The naming suggests that it reflects upon the election, but then it is
> strange that the election becomes clean through recovery (which obviously
> cannot restore the lost data). An alternative name might be
> "UncleanRecoveryRequired." Another option might be to consider it more of a
> partition state. After an unclean election, then the state might be
> UNCLEAN_ELECTED. After recovery, it might transition to UNCLEAN_RECOVERED.
> Then at least we keep track of the fact that the current leader was
> uncleanly elected. Not sure how important that is, just a thought..
>
> Best,
> Jason
>
>
>
>
> On Mon, Jan 10, 2022 at 11:47 AM José Armando García Sancio
> <jsan...@confluent.io.invalid> wrote:
>
> > Hi all,
> >
> > I would like to open the discussion on implementing "KIP-704: Send a
> > hint to broker if it is an unclean leader." See this wiki page for
> > details: https://cwiki.apache.org/confluence/x/kAZRCg
> >
> > Thanks!
> > --
> > -Jose
> >
>


-- 
-David

Reply via email to