Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

Guozhang Wang Tue, 24 Mar 2020 09:57:30 -0700

In Kafka clients, there are cases where we log a warning when overriding
some conflicting configs and in some other cases we throw and let the
brokers to die during startup  --- you can check the
`postProcessParsedConfig` function in Producer/ConsumerConfig for such
logic.


I think for this case, it is sufficient to log a warning if we find the
`max` < `backoff`.


Guozhang

On Mon, Mar 23, 2020 at 9:18 PM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Got it, although I would still like to be aware of the actual backoff I
> will be using in production, having the app crash seems like an
> over-reaction. I don't think I have further questions :)
>
> On Mon, Mar 23, 2020 at 7:36 PM Sanjana Kaundinya <skaundi...@gmail.com>
> wrote:
>
> > Hey Sanjana,
> >
> > Hey Boyang,
> >
> > If a user provides no config at all then as you mentioned they will be
> > default be able to make use of the exponential back off feature
> introduced
> > by the KIP. If the backoff.ms is overriden to 2000 ms, the lesser of
> > either
> > the max or the computed back off will be chosen, so in this case the max
> > will be chosen as it is 1000 ms. As Guozhang mentioned if the user
> > configures something like this then they would notice the behavior to not
> > be in line what they expect and would see the KIP + Release notes and
> know
> > to configure it to be backoff.ms < max backoff.ms. I’m not sure if its
> as
> > big of an error to reject the configuration if it’s configured like this,
> > as the clients could still run in either case.
> >
> > To answer your second question, we are making the dynamic backoff the
> > default and not allowing for static backoff (unless they set backoff.ms
> >
> > max.backof.ms, then that would in a sense be static) We will include
> this
> > information in the release notes to make sure users are aware of this
> > behavior change.
> >
> > Thanks,
> > Sanjana
> >
> > On Mon, Mar 23, 2020 at 6:37 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Hey Sanjana,
> > >
> > > my understanding with the update is that if a user provides no config
> at
> > > all, a Producer/Consumer/Admin client user would by default enjoying a
> > > starting backoff.ms as 100 ms and max.backoff.ms as 1000 ms? If I
> > already
> > > override the backoff.ms to 2000 ms for instance, will I be choosing
> the
> > > default max.backoff here?
> > >
> > > I guess my question would be whether we should just reject a config
> with
> > > backoff.ms > max.backoff.ms in the first place, as this looks like
> > > mis-configuration to me.
> > >
> > > Second question is whether we allow fallback to static backoffs if the
> > user
> > > wants to do so, or we should just ship this as an opt-in feature?
> > >
> > > Let me know your thoughts.
> > >
> > > Boyang
> > >
> > > On Mon, Mar 23, 2020 at 11:38 AM Cheng Tan <c...@confluent.io> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > > On Mar 19, 2020, at 7:27 PM, Sanjana Kaundinya <
> skaundi...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > Ah yes that makes sense. I’ll update the KIP to reflect this.
> > > > >
> > > > > Thanks,
> > > > > Sanjana
> > > > >
> > > > > On Thu, Mar 19, 2020 at 5:48 PM Guozhang Wang <wangg...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Following the formula you have in the KIP, if it is simply:
> > > > >>
> > > > >> MIN(retry.backoff.max.ms, (retry.backoff.ms * 2**(failures - 1))
> *
> > > > random(
> > > > >> 0.8, 1.2))
> > > > >>
> > > > >> then the behavior would stay consistent at retry.backoff.max.ms.
> > > > >>
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >> On Thu, Mar 19, 2020 at 5:46 PM Sanjana Kaundinya <
> > > skaundi...@gmail.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >>> If that’s the case then what should we base the starting point
> as?
> > > > >>> Currently in the KIP the starting point is retry.backoff.ms and
> it
> > > > >>> exponentially goes up to retry.backoff.max.ms. If
> > > retry.backoff.max.ms
> > > > >> is
> > > > >>> smaller than retry.backoff.ms then that could pose a bit of a
> > > problem
> > > > >>> there right?
> > > > >>>
> > > > >>> On Mar 19, 2020, 5:44 PM -0700, Guozhang Wang <
> wangg...@gmail.com
> > >,
> > > > >> wrote:
> > > > >>>> Thanks Sanjana, I did not capture the part that Jason referred
> to,
> > > so
> > > > >>>> that's my bad :P
> > > > >>>>
> > > > >>>> Regarding your last statement, I actually feel that instead of
> > take
> > > > the
> > > > >>>> larger of the two, we should respect "retry.backoff.max.ms"
> even
> > if
> > > > it
> > > > >>> is
> > > > >>>> smaller than "retry.backoff.ms". I do not have a very strong
> > > > rationale
> > > > >>>> except it is logically more aligned to the config names.
> > > > >>>>
> > > > >>>>
> > > > >>>> Guozhang
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, Mar 19, 2020 at 5:39 PM Sanjana Kaundinya <
> > > > >> skaundi...@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hey Jason and Guozhang,
> > > > >>>>>
> > > > >>>>> Jason is right, I took this inspiration from KIP-144 (
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-144%3A+Exponential+backoff+for+broker+reconnect+attempts
> > > > >>>>> )
> > > > >>>>> which had the same logic in order to preserve the existing
> > > behavior.
> > > > >> In
> > > > >>>>> this case however, if we are thinking to completely eliminate
> the
> > > > >>> static
> > > > >>>>> backoff behavior, we can do that and as Jason mentioned put it
> in
> > > the
> > > > >>>>> release notes and not add any special logic. In addition I
> agree
> > > that
> > > > >>> we
> > > > >>>>> should take the larger of the two of `retry.backoff.ms` and `
> > > > >>>>> retry.backoff.max.ms`. I'll update the KIP to reflect this and
> > > make
> > > > >> it
> > > > >>>>> clear that the old static retry backoff is getting replaced by
> > the
> > > > >> new
> > > > >>>>> dynamic retry backoff.
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Sanjana
> > > > >>>>> On Thu, Mar 19, 2020 at 4:23 PM Jason Gustafson <
> > > ja...@confluent.io>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hey Guozhang,
> > > > >>>>>>
> > > > >>>>>> I was referring to this:
> > > > >>>>>>
> > > > >>>>>>> For users who have not set retry.backoff.ms explicitly, the
> > > > >>> default
> > > > >>>>>> behavior will change so that the backoff will grow up to 1000
> > ms.
> > > > >> For
> > > > >>>>> users
> > > > >>>>>> who have set retry.backoff.ms explicitly, the behavior will
> > > remain
> > > > >>> the
> > > > >>>>>> same
> > > > >>>>>> as they could have specific requirements.
> > > > >>>>>>
> > > > >>>>>> I took this to mean that for users who have overridden `
> > > > >>> retry.backoff.ms
> > > > >>>>> `
> > > > >>>>>> to 50ms (say), we will change the default `
> retry.backoff.max.ms
> > `
> > > > >> to
> > > > >>> 50ms
> > > > >>>>>> as
> > > > >>>>>> well in order to preserve existing backoff behavior. Is that
> not
> > > > >>> right?
> > > > >>>>> In
> > > > >>>>>> any case, I agree that we can use the maximum of the two
> values
> > as
> > > > >>> the
> > > > >>>>>> effective `retry.backoff.max.ms` to handle the case when the
> > > > >>> configured
> > > > >>>>>> value of `retry.backoff.ms` is larger than the default of 1s.
> > > > >>>>>>
> > > > >>>>>> -Jason
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Thu, Mar 19, 2020 at 3:29 PM Guozhang Wang <
> > wangg...@gmail.com
> > > >
> > > > >>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Hey Jason,
> > > > >>>>>>>
> > > > >>>>>>> My understanding is a bit different here: even if user has an
> > > > >>> explicit
> > > > >>>>>>> overridden "retry.backoff.ms", the exponential mechanism
> still
> > > > >>>>> triggers
> > > > >>>>>>> and
> > > > >>>>>>> the backoff would be increased till "retry.backoff.max.ms";
> > and
> > > > >>> if the
> > > > >>>>>>> specified "retry.backoff.ms" is already larger than the "
> > > > >>>>>>> retry.backoff.max.ms", we would still take "
> > retry.backoff.max.ms
> > > > >> ".
> > > > >>>>>>>
> > > > >>>>>>> So if the user does override the "retry.backoff.ms" to a
> value
> > > > >>> larger
> > > > >>>>>> than
> > > > >>>>>>> 1s and is not aware of the new config, she would be surprised
> > to
> > > > >>> see
> > > > >>>>> the
> > > > >>>>>>> specified value seemingly not being respected, but she could
> > > > >> still
> > > > >>>>> learn
> > > > >>>>>>> that afterwards by reading the release notes introducing this
> > KIP
> > > > >>>>>> anyways.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Guozhang
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Mar 19, 2020 at 3:10 PM Jason Gustafson <
> > > > >>> ja...@confluent.io>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi Sanjana,
> > > > >>>>>>>>
> > > > >>>>>>>> The KIP looks good to me. I had just one question about the
> > > > >>> default
> > > > >>>>>>>> behavior. As I understand, if the user has specified `
> > > > >>>>> retry.backoff.ms
> > > > >>>>>> `
> > > > >>>>>>>> explicitly, then we will not apply the default max backoff.
> As
> > > > >>> such,
> > > > >>>>>>>> there's no way to get the benefit of this feature if you are
> > > > >>>>> providing
> > > > >>>>>> a
> > > > >>>>>>> `
> > > > >>>>>>>> retry.backoff.ms` unless you also provide `
> > > > >> retry.backoff.max.ms
> > > > >>> `.
> > > > >>>>> That
> > > > >>>>>>>> makes sense if you assume the user is unaware of the new
> > > > >>>>> configuration,
> > > > >>>>>>> but
> > > > >>>>>>>> it is surprising otherwise. Since it's not a semantic change
> > > > >> and
> > > > >>>>> since
> > > > >>>>>>> the
> > > > >>>>>>>> default you're proposing of 1s is fairly low already, I
> wonder
> > > > >> if
> > > > >>>>> it's
> > > > >>>>>>> good
> > > > >>>>>>>> enough to mention the new configuration in the release notes
> > > > >> and
> > > > >>> not
> > > > >>>>>> add
> > > > >>>>>>>> any special logic. What do you think?
> > > > >>>>>>>>
> > > > >>>>>>>> -Jason
> > > > >>>>>>>>
> > > > >>>>>>>> On Thu, Mar 19, 2020 at 1:56 PM Sanjana Kaundinya <
> > > > >>>>>> skaundi...@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Thank you for the comments Guozhang.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I’ll leave this KIP out for discussion till the end of the
> > > > >>> week and
> > > > >>>>>>> then
> > > > >>>>>>>>> start a vote for this early next week.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Sanjana
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mar 18, 2020, 3:38 PM -0700, Guozhang Wang <
> > > > >>> wangg...@gmail.com
> > > > >>>>>> ,
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>> Hello Sanjana,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thanks for the proposed KIP, I think that makes a lot of
> > > > >>> sense --
> > > > >>>>>> as
> > > > >>>>>>>> you
> > > > >>>>>>>>>> mentioned in the motivation, we've indeed seen many issues
> > > > >>> with
> > > > >>>>>>> regard
> > > > >>>>>>>> to
> > > > >>>>>>>>>> the frequent retries, with bounded exponential backoff in
> > > > >> the
> > > > >>>>>>> scenario
> > > > >>>>>>>>>> where there's a long connectivity issue we would
> > > > >> effectively
> > > > >>>>> reduce
> > > > >>>>>>> the
> > > > >>>>>>>>>> request load by 10 given the default configs.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For higher-level Streams client and Connect frameworks,
> > > > >>> today we
> > > > >>>>>> also
> > > > >>>>>>>>> have
> > > > >>>>>>>>>> a retry logic but that's used in a slightly different way.
> > > > >>> For
> > > > >>>>>>> example
> > > > >>>>>>>> in
> > > > >>>>>>>>>> Streams, we tend to handle the retry logic at the
> > > > >>> thread-level
> > > > >>>>> and
> > > > >>>>>>>> hence
> > > > >>>>>>>>>> very likely we'd like to change that mechanism in KIP-572
> > > > >>>>> anyways.
> > > > >>>>>>> For
> > > > >>>>>>>>>> producer / consumer / admin clients, I think just applying
> > > > >>> this
> > > > >>>>>>>>> behavioral
> > > > >>>>>>>>>> change across these clients makes lot of sense. So I think
> > > > >>> can
> > > > >>>>> just
> > > > >>>>>>>> leave
> > > > >>>>>>>>>> the Streams / Connect out of the scope of this KIP to be
> > > > >>>>> addressed
> > > > >>>>>> in
> > > > >>>>>>>>>> separate discussions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> I do not have further comments about this KIP :) LGTM.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Guozhang
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Wed, Mar 18, 2020 at 12:09 AM Sanjana Kaundinya <
> > > > >>>>>>>> skaundi...@gmail.com
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Thanks for the feedback Boyang.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> If there’s anyone else who has feedback regarding this
> > > > >> KIP,
> > > > >>>>> would
> > > > >>>>>>>>> really
> > > > >>>>>>>>>>> appreciate it hearing it!
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>> Sanjana
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <
> > > > >>>>>> bche...@outlook.com>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Sounds great!
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Get Outlook for iOS<https://aka.ms/o0ukef>
> > > > >>>>>>>>>>>> ________________________________
> > > > >>>>>>>>>>>> From: Sanjana Kaundinya <skaundi...@gmail.com>
> > > > >>>>>>>>>>>> Sent: Tuesday, March 17, 2020 5:54:35 PM
> > > > >>>>>>>>>>>> To: dev@kafka.apache.org <dev@kafka.apache.org>
> > > > >>>>>>>>>>>> Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for
> > > > >>> Kafka
> > > > >>>>>>>> Clients
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Thanks for the explanation Boyang. One of the most
> > > > >> common
> > > > >>>>>>> problems
> > > > >>>>>>>>> that
> > > > >>>>>>>>>>> we
> > > > >>>>>>>>>>>> have in Kafka is with respect to metadata fetches. For
> > > > >>>>> example,
> > > > >>>>>>> if
> > > > >>>>>>>>> there
> > > > >>>>>>>>>>> is
> > > > >>>>>>>>>>>> a broker failure, all clients start to fetch metadata
> > > > >> at
> > > > >>> the
> > > > >>>>>> same
> > > > >>>>>>>>> time
> > > > >>>>>>>>>>> and
> > > > >>>>>>>>>>>> it often takes a while for the metadata to converge.
> > > > >> In a
> > > > >>>>> high
> > > > >>>>>>> load
> > > > >>>>>>>>>>>> cluster, there are also issues where the volume of
> > > > >>> metadata
> > > > >>>>> has
> > > > >>>>>>>> made
> > > > >>>>>>>>>>>> convergence of metadata slower.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> For this case, exponential backoff helps as it reduces
> > > > >>> the
> > > > >>>>>> retry
> > > > >>>>>>>>> rate and
> > > > >>>>>>>>>>>> spaces out how often clients will retry, thereby
> > > > >> bringing
> > > > >>>>> down
> > > > >>>>>>> the
> > > > >>>>>>>>> time
> > > > >>>>>>>>>>> for
> > > > >>>>>>>>>>>> convergence. Something that Jason mentioned that would
> > > > >>> be a
> > > > >>>>>> great
> > > > >>>>>>>>>>> addition
> > > > >>>>>>>>>>>> here would be if the backoff should be “jittered” as it
> > > > >>> was
> > > > >>>>> in
> > > > >>>>>>>>> KIP-144
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>>> respect to exponential reconnect backoff. This would
> > > > >> help
> > > > >>>>>> prevent
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>> clients from being synchronized on when they retry,
> > > > >>> thereby
> > > > >>>>>>> spacing
> > > > >>>>>>>>> out
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>> number of requests being sent to the broker at the same
> > > > >>> time.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I’ll add this example to the KIP and flush out more of
> > > > >>> the
> > > > >>>>>>> details
> > > > >>>>>>>> -
> > > > >>>>>>>>> so
> > > > >>>>>>>>>>>> it’s more clear.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Mar 17, 2020, 1:24 PM -0700, Boyang Chen <
> > > > >>>>>>>>> reluctanthero...@gmail.com
> > > > >>>>>>>>>>>> ,
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>> Thanks for the reply Sanjana. I guess I would like to
> > > > >>>>>> rephrase
> > > > >>>>>>> my
> > > > >>>>>>>>>>>> question
> > > > >>>>>>>>>>>>> 2 and 3 as my previous response is a little bit
> > > > >>>>> unactionable.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> My specific point is that exponential backoff is not
> > > > >> a
> > > > >>>>> silver
> > > > >>>>>>>>> bullet
> > > > >>>>>>>>>>> and
> > > > >>>>>>>>>>>> we
> > > > >>>>>>>>>>>>> should consider using it to solve known problems,
> > > > >>> instead
> > > > >>>>> of
> > > > >>>>>>>>> making the
> > > > >>>>>>>>>>>>> holistic changes to all clients in Kafka ecosystem. I
> > > > >>> do
> > > > >>>>> like
> > > > >>>>>>> the
> > > > >>>>>>>>>>>>> exponential backoff idea and believe this would be of
> > > > >>> great
> > > > >>>>>>>> value,
> > > > >>>>>>>>> but
> > > > >>>>>>>>>>>>> maybe we should focus on proposing some existing
> > > > >>> modules
> > > > >>>>> that
> > > > >>>>>>> are
> > > > >>>>>>>>>>>> suffering
> > > > >>>>>>>>>>>>> from static retry, and only change them in this first
> > > > >>> KIP.
> > > > >>>>> If
> > > > >>>>>>> in
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>>>> future, some other component users believe they are
> > > > >>> also
> > > > >>>>>>>>> suffering, we
> > > > >>>>>>>>>>>>> could get more minor KIPs to change the behavior as
> > > > >>> well.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Boyang
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya <
> > > > >>>>>>>>>>> skaundi...@gmail.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Thanks for the feedback Boyang, I will revise the
> > > > >> KIP
> > > > >>>>> with
> > > > >>>>>>> the
> > > > >>>>>>>>>>>>>> mathematical relations as per your suggestion. To
> > > > >>> address
> > > > >>>>>>> your
> > > > >>>>>>>>>>>> feedback:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> 1. Currently, with the default of 100 ms per retry
> > > > >>>>> backoff,
> > > > >>>>>>> in
> > > > >>>>>>>> 1
> > > > >>>>>>>>>>> second
> > > > >>>>>>>>>>>>>> we would have 10 retries. In the case of using an
> > > > >>>>>> exponential
> > > > >>>>>>>>>>> backoff,
> > > > >>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>> would have a total of 4 retries in 1 second. Thus
> > > > >> we
> > > > >>> have
> > > > >>>>>>> less
> > > > >>>>>>>>> than
> > > > >>>>>>>>>>>> half of
> > > > >>>>>>>>>>>>>> the amount of retries in the same timeframe and can
> > > > >>>>> lessen
> > > > >>>>>>>> broker
> > > > >>>>>>>>>>>> pressure.
> > > > >>>>>>>>>>>>>> This calculation is done as following (using the
> > > > >>> formula
> > > > >>>>>> laid
> > > > >>>>>>>>> out in
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> KIP:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Try 1 at time 0 ms, failures = 0, next retry in 100
> > > > >>> ms
> > > > >>>>>>> (default
> > > > >>>>>>>>> retry
> > > > >>>>>>>>>>>> ms
> > > > >>>>>>>>>>>>>> is initially 100 ms)
> > > > >>>>>>>>>>>>>> Try 2 at time 100 ms, failures = 1, next retry in
> > > > >>> 200 ms
> > > > >>>>>>>>>>>>>> Try 3 at time 300 ms, failures = 2, next retry in
> > > > >>> 400 ms
> > > > >>>>>>>>>>>>>> Try 4 at time 700 ms, failures = 3, next retry in
> > > > >>> 800 ms
> > > > >>>>>>>>>>>>>> Try 5 at time 1500 ms, failures = 4, next retry in
> > > > >>> 1000
> > > > >>>>> ms
> > > > >>>>>>>>> (default
> > > > >>>>>>>>>>> max
> > > > >>>>>>>>>>>>>> retry ms is 1000 ms)
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> For 2 and 3, could you elaborate more about what
> > > > >> you
> > > > >>> mean
> > > > >>>>>>> with
> > > > >>>>>>>>>>> respect
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> client timeouts? I’m not very familiar with the
> > > > >>> Streams
> > > > >>>>>>>>> framework, so
> > > > >>>>>>>>>>>> would
> > > > >>>>>>>>>>>>>> love to get more insight to how that currently
> > > > >> works,
> > > > >>>>> with
> > > > >>>>>>>>> respect to
> > > > >>>>>>>>>>>>>> producer transactions, so I can appropriately
> > > > >> update
> > > > >>> the
> > > > >>>>>> KIP
> > > > >>>>>>> to
> > > > >>>>>>>>>>> address
> > > > >>>>>>>>>>>>>> these scenarios.
> > > > >>>>>>>>>>>>>> On Mar 13, 2020, 7:15 PM -0700, Boyang Chen <
> > > > >>>>>>>>>>>> reluctanthero...@gmail.com>,
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>> Thanks for the KIP Sanjana. I think the
> > > > >> motivation
> > > > >>> is
> > > > >>>>>> good,
> > > > >>>>>>>> but
> > > > >>>>>>>>>>> lack
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> more quantitative analysis. For instance:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> 1. How much retries we are saving by applying the
> > > > >>>>>>> exponential
> > > > >>>>>>>>> retry
> > > > >>>>>>>>>>>> vs
> > > > >>>>>>>>>>>>>>> static retry? There should be some mathematical
> > > > >>>>> relations
> > > > >>>>>>>>> between
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> static retry ms, the initial exponential retry
> > > > >> ms,
> > > > >>> the
> > > > >>>>>> max
> > > > >>>>>>>>>>>> exponential
> > > > >>>>>>>>>>>>>>> retry ms in a given time interval.
> > > > >>>>>>>>>>>>>>> 2. How does this affect the client timeout? With
> > > > >>>>>>> exponential
> > > > >>>>>>>>> retry,
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> client shall be getting easier to timeout on a
> > > > >>> parent
> > > > >>>>>> level
> > > > >>>>>>>>> caller,
> > > > >>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>> instance stream attempts to retry initializing
> > > > >>> producer
> > > > >>>>>>>>>>> transactions
> > > > >>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>> given 5 minute interval. With exponential retry
> > > > >>> this
> > > > >>>>>>>> mechanism
> > > > >>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>> experience more frequent timeout which we should
> > > > >> be
> > > > >>>>>> careful
> > > > >>>>>>>>> with.
> > > > >>>>>>>>>>>>>>> 3. With regards to #2, we should have more
> > > > >> detailed
> > > > >>>>>>> checklist
> > > > >>>>>>>>> of
> > > > >>>>>>>>>>> all
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> existing static retry scenarios, and adjust the
> > > > >>> initial
> > > > >>>>>>>>> exponential
> > > > >>>>>>>>>>>> retry
> > > > >>>>>>>>>>>>>>> ms to make sure we won't get easily timeout in
> > > > >> high
> > > > >>>>> level
> > > > >>>>>>> due
> > > > >>>>>>>>> to
> > > > >>>>>>>>>>> too
> > > > >>>>>>>>>>>> few
> > > > >>>>>>>>>>>>>>> attempts.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Boyang
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Fri, Mar 13, 2020 at 4:38 PM Sanjana
> > > > >> Kaundinya <
> > > > >>>>>>>>>>>> skaundi...@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Hi Everyone,
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I’ve written a KIP about introducing
> > > > >> exponential
> > > > >>>>>> backoff
> > > > >>>>>>>> for
> > > > >>>>>>>>>>> Kafka
> > > > >>>>>>>>>>>>>>>> clients. Would appreciate any feedback on this.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>>>>>>> Sanjana
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> --
> > > > >>>>>>>>>> -- Guozhang
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> -- Guozhang
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>> -- Guozhang
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> -- Guozhang
> > > > >>
> > > >
> > > >
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

Reply via email to