Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

Sanjana Kaundinya Wed, 18 Mar 2020 00:10:40 -0700

Thanks for the feedback Boyang.

If there’s anyone else who has feedback regarding this KIP, would really
appreciate it hearing it!


Thanks,
Sanjana

On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <bche...@outlook.com> wrote:

> Sounds great!
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Sanjana Kaundinya <skaundi...@gmail.com>
> Sent: Tuesday, March 17, 2020 5:54:35 PM
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
>
> Thanks for the explanation Boyang. One of the most common problems that we
> have in Kafka is with respect to metadata fetches. For example, if there is
> a broker failure, all clients start to fetch metadata at the same time and
> it often takes a while for the metadata to converge. In a high load
> cluster, there are also issues where the volume of metadata has made
> convergence of metadata slower.
>
> For this case, exponential backoff helps as it reduces the retry rate and
> spaces out how often clients will retry, thereby bringing down the time for
> convergence. Something that Jason mentioned that would be a great addition
> here would be if the backoff should be “jittered” as it was in KIP-144 with
> respect to exponential reconnect backoff. This would help prevent the
> clients from being synchronized on when they retry, thereby spacing out the
> number of requests being sent to the broker at the same time.
>
> I’ll add this example to the KIP and flush out more of the details - so
> it’s more clear.
>
> On Mar 17, 2020, 1:24 PM -0700, Boyang Chen <reluctanthero...@gmail.com>,
> wrote:
> > Thanks for the reply Sanjana. I guess I would like to rephrase my
> question
> > 2 and 3 as my previous response is a little bit unactionable.
> >
> > My specific point is that exponential backoff is not a silver bullet and
> we
> > should consider using it to solve known problems, instead of making the
> > holistic changes to all clients in Kafka ecosystem. I do like the
> > exponential backoff idea and believe this would be of great value, but
> > maybe we should focus on proposing some existing modules that are
> suffering
> > from static retry, and only change them in this first KIP. If in the
> > future, some other component users believe they are also suffering, we
> > could get more minor KIPs to change the behavior as well.
> >
> > Boyang
> >
> > On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya <skaundi...@gmail.com
> >
> > wrote:
> >
> > > Thanks for the feedback Boyang, I will revise the KIP with the
> > > mathematical relations as per your suggestion. To address your
> feedback:
> > >
> > > 1. Currently, with the default of 100 ms per retry backoff, in 1 second
> > > we would have 10 retries. In the case of using an exponential backoff,
> we
> > > would have a total of 4 retries in 1 second. Thus we have less than
> half of
> > > the amount of retries in the same timeframe and can lessen broker
> pressure.
> > > This calculation is done as following (using the formula laid out in
> the
> > > KIP:
> > >
> > > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry
> ms
> > > is initially 100 ms)
> > > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> > > Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> > > Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> > > Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> > > retry ms is 1000 ms)
> > >
> > > For 2 and 3, could you elaborate more about what you mean with respect
> to
> > > client timeouts? I’m not very familiar with the Streams framework, so
> would
> > > love to get more insight to how that currently works, with respect to
> > > producer transactions, so I can appropriately update the KIP to address
> > > these scenarios.
> > > On Mar 13, 2020, 7:15 PM -0700, Boyang Chen <
> reluctanthero...@gmail.com>,
> > > wrote:
> > > > Thanks for the KIP Sanjana. I think the motivation is good, but lack
> of
> > > > more quantitative analysis. For instance:
> > > >
> > > > 1. How much retries we are saving by applying the exponential retry
> vs
> > > > static retry? There should be some mathematical relations between the
> > > > static retry ms, the initial exponential retry ms, the max
> exponential
> > > > retry ms in a given time interval.
> > > > 2. How does this affect the client timeout? With exponential retry,
> the
> > > > client shall be getting easier to timeout on a parent level caller,
> for
> > > > instance stream attempts to retry initializing producer transactions
> with
> > > > given 5 minute interval. With exponential retry this mechanism could
> > > > experience more frequent timeout which we should be careful with.
> > > > 3. With regards to #2, we should have more detailed checklist of all
> the
> > > > existing static retry scenarios, and adjust the initial exponential
> retry
> > > > ms to make sure we won't get easily timeout in high level due to too
> few
> > > > attempts.
> > > >
> > > > Boyang
> > > >
> > > > On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya <
> skaundi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Everyone,
> > > > >
> > > > > I’ve written a KIP about introducing exponential backoff for Kafka
> > > > > clients. Would appreciate any feedback on this.
> > > > >
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > > > >
> > > > > Thanks,
> > > > > Sanjana
> > > > >
> > >
>

Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients

Reply via email to