Hi Jack, see my comments below.

On Thu, Feb 1, 2024 at 7:26 AM Jack Vanlightly <vanligh...@apache.org> wrote:
> After thinking it through, it occurs to me that in examples 1 and 2, the
> leader (of the latest configuration) should be sending BeginQuorumEpoch
> requests to r3 after a certain timeout? r3 can start elections (based on
> its stale configuration) which will go nowhere, until it eventually
> receives a BeginQuorumEpoch from the leader and it will learn of the leader
> and resume fetching.
>
> In the case of an observer, I suppose it must fallback to
> 'controller.quorum.voters' or  'controller.quorum.bootstrap.servers' to
> learn of the leader?

Great examples. The short answer is that yes, the leader needs to keep
sending a BeingQuorumEpoch to a voter until the voter acknowledges it.
In the current KRaft implementation, the KRaft leader already does
this (the set of unacknowledged voters is tracked in
LeaderState::nonAcknowledgeVoters and it used to send the
BeginQuorumEpoch RPC). The leader continues to send the
BeginQuorumEpoch RPC until a voter has acknowledged it. When the voter
handles the BeginQuorumEpoch, it first persists the leader id, leader
uuid (new in this KIP) and leader epoch to the quorum-state file
before replying to the RPC. Since the leader's state is persisted
before replying to the BeginQuorumEpoch RPC, one RPC and
acknowledgement is enough. Note that if a replica loses its disk and
quorum state it will come back with a different replica uuid and won't
be part of the quorum.

I think that the other important observation is that, in this KIP,
KRaft will have two sets of endpoints: 1. voters set and 2. bootstrap
servers.

1. The voters set can come from the log as you described or from
controller.quorum.voters if the log doesn't contain any voters sets.
2. The bootstrap servers can come from
controller.quorum.bootstrap.servers or controller.quorum.voters in
that order of preference.

The Fetch RPC will always be sent to the leader's endpoint if it is
known. If the leader is not known, the Fetch RPC will be sent to one
of the endpoints in the bootstrap server set.

The Vote, BeginQuorumEpoch and EndQuorumEpoch will always be sent
using the replicas and endpoints specified in the voters set.

Your example also highlights the importance of "KIP-996: Pre-Vote" to
avoid disruption to the quorum while lagging replicas catch up.

Thanks for your feedback Jack. I'll update the KIP to make this clear.

Thanks,
-- 
-José

Reply via email to