Hi José,

Thanks for the great KIP. It took me quite a lot of time reading and
digesting!

Some questions and comments:
1. About "VotersRecord":

> When a KRaft voter becomes leader it will write a KRaftVersionRecord and
VotersRecord to the log if the log or the latest snapshot doesn't contain
any VotersRecord. This is done to make sure that the voter set in the
bootstrap snapshot gets replicated to all of the voters and to not rely on
all of the voters being configured with the same bootstrapped voter set.

> This record will also be written to the log if it has never been written
to the log in the past. This semantic is nice to have to consistently
replicate the bootstrapping snapshot, at
00000000000000000000-0000000000.checkpoint, of the leader to all of the
voters.

If the `VotersRecord` has written into
00000000000000000000-0000000000.checkpoint,
later, a new voter added. Will we write a new checkpoint to the file?
If so, does that mean the `metadata.log.max.snapshot.interval.ms` will be
ignored?
If not, then how could we make sure the voter set in the bootstrap snapshot
gets replicated to all of the voters and to not rely on all of the voters
being configured with the same bootstrapped voter set?

2. After "RemoveVoter", what is the role of the node?
It looks like after the voter got removed from the voter set, it is not a
voter anymore. But I think it can still fetch with the leader. So it should
be an observer, with a "process.role=controller"? And if the node was
originally "process.role=controller,broker", it'll become a broker-only
node?

3. "UpdateVoter" RPC V.S. "ControllerRegistration" RPC?
To me, the purpose of "UpdateVoter" is basically identical with
"ControllerRegistration", to register the voter info in the active
controller/metadata. Should we re-use the existing "ControllerRegistration"
one?

4. > When the kraft.version is upgraded to a version greater 0 and the
version is supported by the voters, the leader will write a control record
batch that will include the kraft.version record and all of the AddVoter
records for all of the voters...

When can't we just write a "VoterRecord" including all voters?

5. In "LeaderChangeMessage", You said:
> Add an optional VoterUuid to Voter. This change is not needed for
correctness but it is nice to have for tracing and debugging. The leader
will write version 0 if the kraft.version is 0. The leader will write
version 1 if the kraft.version is 1.

And in the code change below:
+      { "name": "VoterUuid", "type": "int32", "versions": "1+" }
So, a. this VoterUuid field is a int32 field?
b. Where do we write "Kraft.version" to?

6. In the "UpdateVoter", the handling section contains many `AddVoter`
names, which is confusing.

7. About this:
> This KIP requires the KRaft implementation now read uncommitted data from
log and snapshot to discover the voter set. This also means that the KRaft
implementation needs to handle this uncommitted state getting truncated and
reverted.

I understand why this is required now, but I imagine it will make the
protocol more complicated. Could you elaborate more about how we're going
to handle the uncommitted data in the KIP?


Thank you.
Luke





On Tue, Jan 23, 2024 at 9:00 AM José Armando García Sancio
<jsan...@confluent.io.invalid> wrote:

> Thanks Jason. Comments below.
>
> On Wed, Jan 10, 2024 at 9:06 AM Jason Gustafson
> <ja...@confluent.io.invalid> wrote:
> > One additional thought. It would be helpful to have an example to justify
> > the need for this:
> >
> > > Wait for the fetch offset of the replica (ID, UUID) to catch up to the
> > log end offset of the leader.
> >
> > It is helpful also to explain how this affects the AddVoter RPC. Do we
> wait
> > indefinitely? Or do we give up and return a timeout error if the new
> voter
> > cannot catch up? Probably the latter makes the most sense.
>
> Yes. I will have more details here. Jason and I discussed this offline
> but waiting for the new replica to catch (to the LEO) is a heuristic
> that would minimize the amount of time where the leader cannot
> increase the HWM because the new replica is needed to form the
> majority. A example that shows this is:
>
> Current Voter Set:
> A: offset = 100
> B: offset = 100
> C: offset = 0
>
> In this configuration the leader can continue to advance the HWM since
> the majority A, B is at the HWM/LEO.
>
> If the user now adds a voter to the voter set:
> A: offset = 100
> B: offset = 100
> C: offset = 0
> D: offset = 0
>
> The leader cannot advance the HWM until either C or D catches up to
> the HWM because the majority has to include one of either C, D or
> both.
>
> Thanks,
> --
> -José
>

Reply via email to