Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

Mayank Shekhar Narula Wed, 19 Jul 2023 14:31:09 -0700

Luke

Thanks for the interest in the KIP.


But what if the consumer was fetching from the follower?

We already include `PreferredReadReplica` in the fetch response.
> Should we put the node info of PreferredReadReplica under this case,
> instead of the leader's info?
>

PreferredReadReplica is the decided on the leader. Looking at the Java
client code, AbstractFetch::selectReadReplica, first fetch request goes to
Leader of the partition -> Sends back PreferredReadReplica -> Next fetch
uses PreferredReadReplica. So as long as leader is available,
PreferredReadReplica would be found in subsequent fetches.

Also, under this case, should we include the leader's info in the response?


In this case, I think the follower would fail the fetch if it knows a
different leader. If the follower knows a newer leader, it would return new
leader information in the response, for the client to act on.


Will we include the leader/node info in the response when having
> `UNKNOWN_LEADER_EPOCH` error?


My understanding is UNKNOWN_LEADER_EPOCH when a request from a client has a
newer epoch than the broker. So the client is already up to date on new
leader information, it's the broker that has the catching up to do. I think
there might be some optimisations to make sure the broker refreshes its
metadata quickly, so it can quickly recover to handle requests that
previously returned UNKNOWN_LEADER_EPOCH. But this work is outside the
scope of this KIP, as for now this KIP focusses on client-side
optimisations.

Mayank

On Tue, Jul 18, 2023 at 8:51 AM Luke Chen <show...@gmail.com> wrote:

> Hi Mayank,
>
> Thanks for the KIP!
>
> Some questions:
> 1. I can see most of the cases we only care about consumer fetch from the
> leader.
> But what if the consumer was fetching from the follower?
> We already include `PreferredReadReplica` in the fetch response.
> Should we put the node info of PreferredReadReplica under this case,
> instead of the leader's info?
> Also, under this case, should we include the leader's info in the response?
>
> 2. Will we include the leader/node info in the response when having
> `UNKNOWN_LEADER_EPOCH` error?
> I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` error since when
> this happens, the node might have some error which should refresh the
> metadata. On the other hand, it might also be good if we can heal the node
> soon to do produce/consume works.
>
>
> Thank you.
> Luke
>
> On Tue, Jul 18, 2023 at 2:00 AM Philip Nee <philip...@gmail.com> wrote:
>
> > Hey Mayank:
> >
> > For #1: I think fetch and produce behave a bit differently on metadata.
> > Maybe it is worth highlighting the changes for each client in detail. In
> > producer did you mean by the metadata timeout before sending out produce
> > requests? For consumer: I think for fetches it requires user to retry if
> > the position does not exist on the leader. I don't have the detail on top
> > of my head, but I think we should lay out these behavioral changes.
> >
> > For #3: Thanks for the clarification.
> >
> > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula <
> > mayanks.nar...@gmail.com> wrote:
> >
> > > Philip
> > >
> > > 1. Good call out about "poll" behaviour, my understanding is the same.
> I
> > am
> > > assuming it's about the motivation of the KIP. There with async, my
> > > intention was to convey that the client doesn't wait for the
> > > metadata-refresh before a subsequent retry of the produce or fetch
> > request
> > > that failed due to stale metadata(i.e. going to an old leader). The
> only
> > > wait client has is the configured retry-delay.
> > >
> > > 2. Yes, in theory other APIs could benefit from this too. But that is
> > > outside of the scope of the KIP.
> > >
> > > 3. Do you mean the response for the Metadata RPC? I think brokers
> always
> > > have a view of the cluster, although it can be stale,it would always
> > return
> > > a leader(whether old or new).
> > >
> > > Mayank
> > >
> > > On Fri, Jul 14, 2023 at 8:53 PM Philip Nee <philip...@gmail.com>
> wrote:
> > >
> > > > Hey Mayank,
> > > >
> > > > Thanks for the KIP. I think this is a great proposal, and I'm in
> favor
> > > > of this idea.  A few comments:
> > > >
> > > > 1. Claiming metadata refresh is done asynchronously is misleading.
> The
> > > > metadata refresh requires Network Client to be physically polled,
> which
> > > is
> > > > done in a separate thread in Producer and Admin Client (IIRC!) but
> not
> > > > Consumer.
> > > > 2. There are other API calls that might result in
> > NOT_LEADER_OR_FOLLOWER
> > > > response, but it seems like this KIP only wants to update on fetch
> and
> > > > produce. Do we want to make the leader information available for
> other
> > > API
> > > > calls?
> > > > 3. Do you know what would happen during a leader election? I'm not
> sure
> > > > about this process and I wonder if the current metadata response uses
> > the
> > > > old leader or null as the leader isn't readily available yet.
> > > >
> > > > Thanks,
> > > > P
> > > >
> > > > On Fri, Jul 14, 2023 at 11:30 AM Kirk True <k...@kirktrue.pro>
> wrote:
> > > >
> > > > > Hi Mayank,
> > > > >
> > > > > > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar Narula <
> > > > > mayanks.nar...@gmail.com> wrote:
> > > > > >
> > > > > > Kirk
> > > > > >
> > > > > >
> > > > > >> Is the requested restructuring of the response “simply” to
> > preserve
> > > > > bytes,
> > > > > >> or is it possible that the fetch response could/should/would
> > return
> > > > > >> leadership changes for partitions that we’re specifically
> > requested?
> > > > > >>
> > > > > >
> > > > > > Moving endpoints to top-level fields would preserve bytes,
> > otherwise
> > > > the
> > > > > > endpoint-information would be duplicated if included with the
> > > > > > partition-data in the response. Right now, the top-level field
> will
> > > > only
> > > > > be
> > > > > > set in case leader changes for any requested partitions. But it
> can
> > > be
> > > > > > re-used in the future, for which Jose has a use-case in mind
> shared
> > > up
> > > > in
> > > > > > the thread. KIP is now upto date with endpoint info being at
> > > top-level.
> > > > >
> > > > >
> > > > > I didn’t catch before that there was a separate section for the
> full
> > > node
> > > > > information, not just the ID and epoch.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > >>> 3. In the future, I may use this information in the
> > KRaft/Metadata
> > > > > >>> implementation of FETCH. In that implementation not all of the
> > > > > >>> replicas are brokers.
> > > > > >>
> > > > > >> Side point: any references to the change you’re referring to?
> The
> > > idea
> > > > > of
> > > > > >> non-brokers serving as replicas is blowing my mind a bit :)
> > > > > >>
> > > > > >>
> > > > > > Jose, I missed this as well, would love to know more about
> > non-broker
> > > > > > serving as replica!
> > > > > > --
> > > > > > Regards,
> > > > > > Mayank Shekhar Narula
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Mayank Shekhar Narula
> > >
> >
>


-- 
Regards,
Mayank Shekhar Narula

Re: [DISCUSS] KIP-951: Leader discovery optimisations for the client

Reply via email to