Hi, I've updated the KIP with benchmark results focusing more directly on the redirect case, please review and +1 in the voting thread if convinced.
Thank you, Crispin On Wed, Jul 26, 2023 at 11:13 PM Luke Chen <show...@gmail.com> wrote: > Thanks for adding the benchmark results, Crispin! > IMO, 2~5% performance improvement is small, but given the change is small, > cost is also small (only append endpoint info when NOT_LEADER_OR_FOLLOWER.. > etc error), I think it is worth doing it. > > Thank you. > Luke > > On Wed, Jul 26, 2023 at 12:33 AM Ismael Juma <m...@ismaeljuma.com> wrote: > > > Thanks Crispin! > > > > Ismael > > > > On Mon, Jul 24, 2023 at 8:16 PM Crispin Bernier > > <cbern...@confluent.io.invalid> wrote: > > > > > I updated the wiki to include both results along with their average. > > > > > > Thank you, > > > Crispin > > > > > > On Mon, Jul 24, 2023 at 10:58 AM Ismael Juma <m...@ismaeljuma.com> > wrote: > > > > > > > Hi Crispin, > > > > > > > > One additional question, the wiki says "The results are averaged > over 2 > > > > runs.". Can you please provide some measure of variance in the > > > > distribution, i.e. were both results similar to each other for both > > > cases? > > > > > > > > Ismael > > > > > > > > On Fri, Jul 21, 2023 at 11:31 AM Ismael Juma <m...@ismaeljuma.com> > > wrote: > > > > > > > > > Thanks for the update Crispin - very helpful to have actual > > performance > > > > > data. 2-5% for the default configuration is a bit on the low side > for > > > > this > > > > > kind of proposal. > > > > > > > > > > Ismael > > > > > > > > > > On Thu, Jul 20, 2023 at 11:33 PM Crispin Bernier > > > > > <cbern...@confluent.io.invalid> wrote: > > > > > > > > > >> Benchmark numbers have been posted on the KIP, please review. > > > > >> > > > > >> On 2023/07/20 13:03:00 Mayank Shekhar Narula wrote: > > > > >> > Jun > > > > >> > > > > > >> > Thanks for the feedback. > > > > >> > > > > > >> > Numbers to follow. > > > > >> > > > > > >> > If we don't plan to > > > > >> > > bump up the FetchResponse version, we could just remove the > > > > reference > > > > >> to > > > > >> > > version 16. > > > > >> > > > > > >> > Fixed. > > > > >> > > > > > >> > On Thu, Jul 20, 2023 at 1:28 AM Jun Rao > > <ju...@confluent.io.invalid > > > > > > > > >> wrote: > > > > >> > > > > > >> > > Hi, Mayank, > > > > >> > > > > > > >> > > Thanks for the KIP. I agree with others that it would be > useful > > to > > > > >> see the > > > > >> > > performance results. Otherwise, just a minor comment. If we > > don't > > > > >> plan to > > > > >> > > bump up the FetchResponse version, we could just remove the > > > > reference > > > > >> to > > > > >> > > version 16. > > > > >> > > > > > > >> > > Jun > > > > >> > > > > > > >> > > On Wed, Jul 19, 2023 at 2:31 PM Mayank Shekhar Narula < > > > > >> > > mayanks.nar...@gmail.com> wrote: > > > > >> > > > > > > >> > > > Luke > > > > >> > > > > > > > >> > > > Thanks for the interest in the KIP. > > > > >> > > > > > > > >> > > > But what if the consumer was fetching from the follower? > > > > >> > > > > > > > >> > > > We already include `PreferredReadReplica` in the fetch > > response. > > > > >> > > > > Should we put the node info of PreferredReadReplica under > > this > > > > >> case, > > > > >> > > > > instead of the leader's info? > > > > >> > > > > > > > > >> > > > > > > > >> > > > PreferredReadReplica is the decided on the leader. Looking > at > > > the > > > > >> Java > > > > >> > > > client code, AbstractFetch::selectReadReplica, first fetch > > > request > > > > >> goes > > > > >> > > to > > > > >> > > > Leader of the partition -> Sends back PreferredReadReplica > -> > > > Next > > > > >> fetch > > > > >> > > > uses PreferredReadReplica. So as long as leader is > available, > > > > >> > > > PreferredReadReplica would be found in subsequent fetches. > > > > >> > > > > > > > >> > > > Also, under this case, should we include the leader's info > in > > > the > > > > >> > > response? > > > > >> > > > > > > > >> > > > > > > > >> > > > In this case, I think the follower would fail the fetch if > it > > > > knows > > > > >> a > > > > >> > > > different leader. If the follower knows a newer leader, it > > would > > > > >> return > > > > >> > > new > > > > >> > > > leader information in the response, for the client to act > on. > > > > >> > > > > > > > >> > > > > > > > >> > > > Will we include the leader/node info in the response when > > having > > > > >> > > > > `UNKNOWN_LEADER_EPOCH` error? > > > > >> > > > > > > > >> > > > > > > > >> > > > My understanding is UNKNOWN_LEADER_EPOCH when a request > from a > > > > >> client > > > > >> > > has a > > > > >> > > > newer epoch than the broker. So the client is already up to > > date > > > > on > > > > >> new > > > > >> > > > leader information, it's the broker that has the catching up > > to > > > > do. > > > > >> I > > > > >> > > think > > > > >> > > > there might be some optimisations to make sure the broker > > > > refreshes > > > > >> its > > > > >> > > > metadata quickly, so it can quickly recover to handle > requests > > > > that > > > > >> > > > previously returned UNKNOWN_LEADER_EPOCH. But this work is > > > outside > > > > >> the > > > > >> > > > scope of this KIP, as for now this KIP focusses on > client-side > > > > >> > > > optimisations. > > > > >> > > > > > > > >> > > > Mayank > > > > >> > > > > > > > >> > > > On Tue, Jul 18, 2023 at 8:51 AM Luke Chen <sh...@gmail.com> > > > > wrote: > > > > >> > > > > > > > >> > > > > Hi Mayank, > > > > >> > > > > > > > > >> > > > > Thanks for the KIP! > > > > >> > > > > > > > > >> > > > > Some questions: > > > > >> > > > > 1. I can see most of the cases we only care about consumer > > > fetch > > > > >> from > > > > >> > > the > > > > >> > > > > leader. > > > > >> > > > > But what if the consumer was fetching from the follower? > > > > >> > > > > We already include `PreferredReadReplica` in the fetch > > > response. > > > > >> > > > > Should we put the node info of PreferredReadReplica under > > this > > > > >> case, > > > > >> > > > > instead of the leader's info? > > > > >> > > > > Also, under this case, should we include the leader's info > > in > > > > the > > > > >> > > > response? > > > > >> > > > > > > > > >> > > > > 2. Will we include the leader/node info in the response > when > > > > >> having > > > > >> > > > > `UNKNOWN_LEADER_EPOCH` error? > > > > >> > > > > I think it's fine we ignore the `UNKNOWN_LEADER_EPOCH` > error > > > > >> since when > > > > >> > > > > this happens, the node might have some error which should > > > > refresh > > > > >> the > > > > >> > > > > metadata. On the other hand, it might also be good if we > can > > > > heal > > > > >> the > > > > >> > > > node > > > > >> > > > > soon to do produce/consume works. > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > Thank you. > > > > >> > > > > Luke > > > > >> > > > > > > > > >> > > > > On Tue, Jul 18, 2023 at 2:00 AM Philip Nee < > ph...@gmail.com > > > > > > > >> > > wrote: > > > > >> > > > > > > > > >> > > > > > Hey Mayank: > > > > >> > > > > > > > > > >> > > > > > For #1: I think fetch and produce behave a bit > differently > > > on > > > > >> > > metadata. > > > > >> > > > > > Maybe it is worth highlighting the changes for each > client > > > in > > > > >> detail. > > > > >> > > > In > > > > >> > > > > > producer did you mean by the metadata timeout before > > sending > > > > out > > > > >> > > > produce > > > > >> > > > > > requests? For consumer: I think for fetches it requires > > user > > > > to > > > > >> retry > > > > >> > > > if > > > > >> > > > > > the position does not exist on the leader. I don't have > > the > > > > >> detail on > > > > >> > > > top > > > > >> > > > > > of my head, but I think we should lay out these > behavioral > > > > >> changes. > > > > >> > > > > > > > > > >> > > > > > For #3: Thanks for the clarification. > > > > >> > > > > > > > > > >> > > > > > On Mon, Jul 17, 2023 at 10:39 AM Mayank Shekhar Narula < > > > > >> > > > > > mayanks.nar...@gmail.com> wrote: > > > > >> > > > > > > > > > >> > > > > > > Philip > > > > >> > > > > > > > > > > >> > > > > > > 1. Good call out about "poll" behaviour, my > > understanding > > > is > > > > >> the > > > > >> > > > same. > > > > >> > > > > I > > > > >> > > > > > am > > > > >> > > > > > > assuming it's about the motivation of the KIP. There > > with > > > > >> async, my > > > > >> > > > > > > intention was to convey that the client doesn't wait > for > > > the > > > > >> > > > > > > metadata-refresh before a subsequent retry of the > > produce > > > or > > > > >> fetch > > > > >> > > > > > request > > > > >> > > > > > > that failed due to stale metadata(i.e. going to an old > > > > >> leader). The > > > > >> > > > > only > > > > >> > > > > > > wait client has is the configured retry-delay. > > > > >> > > > > > > > > > > >> > > > > > > 2. Yes, in theory other APIs could benefit from this > > too. > > > > But > > > > >> that > > > > >> > > is > > > > >> > > > > > > outside of the scope of the KIP. > > > > >> > > > > > > > > > > >> > > > > > > 3. Do you mean the response for the Metadata RPC? I > > think > > > > >> brokers > > > > >> > > > > always > > > > >> > > > > > > have a view of the cluster, although it can be > stale,it > > > > would > > > > >> > > always > > > > >> > > > > > return > > > > >> > > > > > > a leader(whether old or new). > > > > >> > > > > > > > > > > >> > > > > > > Mayank > > > > >> > > > > > > > > > > >> > > > > > > On Fri, Jul 14, 2023 at 8:53 PM Philip Nee < > > > ph...@gmail.com > > > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > > > Hey Mayank, > > > > >> > > > > > > > > > > > >> > > > > > > > Thanks for the KIP. I think this is a great > proposal, > > > and > > > > >> I'm in > > > > >> > > > > favor > > > > >> > > > > > > > of this idea. A few comments: > > > > >> > > > > > > > > > > > >> > > > > > > > 1. Claiming metadata refresh is done asynchronously > is > > > > >> > > misleading. > > > > >> > > > > The > > > > >> > > > > > > > metadata refresh requires Network Client to be > > > physically > > > > >> polled, > > > > >> > > > > which > > > > >> > > > > > > is > > > > >> > > > > > > > done in a separate thread in Producer and Admin > Client > > > > >> (IIRC!) > > > > >> > > but > > > > >> > > > > not > > > > >> > > > > > > > Consumer. > > > > >> > > > > > > > 2. There are other API calls that might result in > > > > >> > > > > > NOT_LEADER_OR_FOLLOWER > > > > >> > > > > > > > response, but it seems like this KIP only wants to > > > update > > > > on > > > > >> > > fetch > > > > >> > > > > and > > > > >> > > > > > > > produce. Do we want to make the leader information > > > > >> available for > > > > >> > > > > other > > > > >> > > > > > > API > > > > >> > > > > > > > calls? > > > > >> > > > > > > > 3. Do you know what would happen during a leader > > > election? > > > > >> I'm > > > > >> > > not > > > > >> > > > > sure > > > > >> > > > > > > > about this process and I wonder if the current > > metadata > > > > >> response > > > > >> > > > uses > > > > >> > > > > > the > > > > >> > > > > > > > old leader or null as the leader isn't readily > > available > > > > >> yet. > > > > >> > > > > > > > > > > > >> > > > > > > > Thanks, > > > > >> > > > > > > > P > > > > >> > > > > > > > > > > > >> > > > > > > > On Fri, Jul 14, 2023 at 11:30 AM Kirk True < > > > > >> ki...@kirktrue.pro> > > > > >> > > > > wrote: > > > > >> > > > > > > > > > > > >> > > > > > > > > Hi Mayank, > > > > >> > > > > > > > > > > > > >> > > > > > > > > > On Jul 14, 2023, at 11:25 AM, Mayank Shekhar > > Narula > > > < > > > > >> > > > > > > > > mayanks.nar...@gmail.com> wrote: > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Kirk > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > >> Is the requested restructuring of the response > > > > >> “simply” to > > > > >> > > > > > preserve > > > > >> > > > > > > > > bytes, > > > > >> > > > > > > > > >> or is it possible that the fetch response > > > > >> could/should/would > > > > >> > > > > > return > > > > >> > > > > > > > > >> leadership changes for partitions that we’re > > > > >> specifically > > > > >> > > > > > requested? > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Moving endpoints to top-level fields would > > preserve > > > > >> bytes, > > > > >> > > > > > otherwise > > > > >> > > > > > > > the > > > > >> > > > > > > > > > endpoint-information would be duplicated if > > included > > > > >> with the > > > > >> > > > > > > > > > partition-data in the response. Right now, the > > > > top-level > > > > >> > > field > > > > >> > > > > will > > > > >> > > > > > > > only > > > > >> > > > > > > > > be > > > > >> > > > > > > > > > set in case leader changes for any requested > > > > >> partitions. But > > > > >> > > it > > > > >> > > > > can > > > > >> > > > > > > be > > > > >> > > > > > > > > > re-used in the future, for which Jose has a > > use-case > > > > in > > > > >> mind > > > > >> > > > > shared > > > > >> > > > > > > up > > > > >> > > > > > > > in > > > > >> > > > > > > > > > the thread. KIP is now upto date with endpoint > > info > > > > >> being at > > > > >> > > > > > > top-level. > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > I didn’t catch before that there was a separate > > > section > > > > >> for the > > > > >> > > > > full > > > > >> > > > > > > node > > > > >> > > > > > > > > information, not just the ID and epoch. > > > > >> > > > > > > > > > > > > >> > > > > > > > > Thanks! > > > > >> > > > > > > > > > > > > >> > > > > > > > > >>> 3. In the future, I may use this information > in > > > the > > > > >> > > > > > KRaft/Metadata > > > > >> > > > > > > > > >>> implementation of FETCH. In that > implementation > > > not > > > > >> all of > > > > >> > > > the > > > > >> > > > > > > > > >>> replicas are brokers. > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> Side point: any references to the change you’re > > > > >> referring > > > > >> > > to? > > > > >> > > > > The > > > > >> > > > > > > idea > > > > >> > > > > > > > > of > > > > >> > > > > > > > > >> non-brokers serving as replicas is blowing my > > mind > > > a > > > > >> bit :) > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > >> > > > > >> > > > > > > > > > Jose, I missed this as well, would love to know > > more > > > > >> about > > > > >> > > > > > non-broker > > > > >> > > > > > > > > > serving as replica! > > > > >> > > > > > > > > > -- > > > > >> > > > > > > > > > Regards, > > > > >> > > > > > > > > > Mayank Shekhar Narula > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > -- > > > > >> > > > > > > Regards, > > > > >> > > > > > > Mayank Shekhar Narula > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > -- > > > > >> > > > Regards, > > > > >> > > > Mayank Shekhar Narula > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Regards, > > > > >> > Mayank Shekhar Narula > > > > >> > > > > > > > > > > > > > > > > > > > > >