Hi Jose,

I have a question about how voters and observers, which are far behind the
leader, catch-up when there are multiple reconfiguration commands in the
log between their position and the end of the log.

Here are some example situations that need clarification:

Example 1
Imagine a cluster of three voters: r1, r2, r3. Voter r3 goes offline for a
while. In the meantime, r1 dies and gets replaced with r4, and r2 dies
getting replaced with r5. Now the cluster is formed of r3, r4, r5. When r3
comes back online, it tries to fetch from dead nodes and finally starts
unending leader elections - stuck because it doesn't realise it's in a
stale configuration whose members are all dead except for itself.

Example 2
Imagine a cluster of three voters: r1, r2, r3. Voter r3 goes offline then
comes back and discovers the leader is r1. Again, there are many
reconfiguration commands between its LEO and the end of the leader's log.
It starts fetching, changing configurations as it goes until it reaches a
stale configuration (r3, r4, r5) where it is a member but none of its peers
are actually alive anymore. It continues to fetch from the r1, but then for
some reason the connection to r1 is interrupted. r3 starts leader elections
which don't get responses.

Example 3
Imagine a cluster of three voters: r1, r2, r3. Over time, many
reconfigurations have happened and now the voters are (r4, r5, r6). The
observer o1 starts fetching from the nodes in
'controller.quorum.bootstrap.servers' which includes r4. r4 responds with a
NotLeader and that r5 is the leader. o1 starts fetching and goes through
the motion of switching to each configuration as it learns of it in the
log. The connection to r5 gets interrupted while it is in the configuration
(r7, r8, r9). It attempts to fetch from these voters but none respond as
they are all long dead, as this is a stale configuration. Does the observer
fallback to 'controller.quorum.bootstrap.servers' for its list of voters it
can fetch from?

After thinking it through, it occurs to me that in examples 1 and 2, the
leader (of the latest configuration) should be sending BeginQuorumEpoch
requests to r3 after a certain timeout? r3 can start elections (based on
its stale configuration) which will go nowhere, until it eventually
receives a BeginQuorumEpoch from the leader and it will learn of the leader
and resume fetching.

In the case of an observer, I suppose it must fallback to
'controller.quorum.voters' or  'controller.quorum.bootstrap.servers' to
learn of the leader?

Thanks
Jack



On Fri, Jan 26, 2024 at 1:36 AM José Armando García Sancio
<jsan...@confluent.io.invalid> wrote:

> Hi all,
>
> I have updated the KIP to include information on how KRaft controller
> automatic joining will work.
>
> Thanks,
> --
> -José
>

Reply via email to