Thanks Jason. Comments below.

On Wed, Jan 10, 2024 at 9:06 AM Jason Gustafson
<ja...@confluent.io.invalid> wrote:
> One additional thought. It would be helpful to have an example to justify
> the need for this:
>
> > Wait for the fetch offset of the replica (ID, UUID) to catch up to the
> log end offset of the leader.
>
> It is helpful also to explain how this affects the AddVoter RPC. Do we wait
> indefinitely? Or do we give up and return a timeout error if the new voter
> cannot catch up? Probably the latter makes the most sense.

Yes. I will have more details here. Jason and I discussed this offline
but waiting for the new replica to catch (to the LEO) is a heuristic
that would minimize the amount of time where the leader cannot
increase the HWM because the new replica is needed to form the
majority. A example that shows this is:

Current Voter Set:
A: offset = 100
B: offset = 100
C: offset = 0

In this configuration the leader can continue to advance the HWM since
the majority A, B is at the HWM/LEO.

If the user now adds a voter to the voter set:
A: offset = 100
B: offset = 100
C: offset = 0
D: offset = 0

The leader cannot advance the HWM until either C or D catches up to
the HWM because the majority has to include one of either C, D or
both.

Thanks,
-- 
-José

Reply via email to