Re: [DISCUSS] KIP-144: Exponential backoff for broker reconnect attempts

Dana Powers Mon, 08 May 2017 17:40:34 -0700

s/back-office/backoff/

Final note: although the goal here is not to resolve contention (as in the
aws article), I think we do still want a relatively smooth rate of
reconnects across all clients to avoid storm spikes. Full Jitter does that.
I expect that narrower jitter bands will lead to more clumping of
reconnects, but that's maybe ok.


Another idea would be to make jitter configurable. Full jitter would be
100%. No jitter 0%. Equal Jitter 50%. Etc.

On May 8, 2017 5:28 PM, "Dana Powers" <dana.pow...@gmail.com> wrote:

> For some discussion of jitter and exponential back-office, I found this
> article useful:
>
> https://www.awsarchitectureblog.com/2015/03/backoff.html
>
> My initial POC used the "Full Jitter" approach described therein. Equal
> Jitter is good too, and may perform a little better. It is random
> distribution between 50% and 100% of calculated backoff.
>
> Dana
>
> On May 4, 2017 8:50 PM, "Ismael Juma" <ism...@juma.me.uk> wrote:
>
>> Thanks for the feedback Gwen and Colin. I agree that the original formula
>> was not intuitive. I updated it to include a max jitter as was suggested.
>> I
>> also updated the config name to include `ms`:
>>
>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion
>> .action?pageId=69408222&selectedPageVersions=3&selectedPageVersions=1
>>
>> If there are no other concerns, I will start the vote tomorrow.
>>
>> Ismael
>>
>> On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote:
>>
>> > Thanks for the KIP, Ismael & Dana!  This could be pretty important for
>> > avoiding congestion collapse when there are a lot of clients.
>> >
>> > It seems like a good idea to keep the "ms" suffix, like we have with
>> > "reconnect.backoff.ms".  So maybe we should use
>> > "reconnect.backoff.max.ms"?  In general unitless timeouts can be the
>> > source of a lot of confusion (is it seconds, milliseconds, etc.?)
>> >
>> > It's good that the KIP inject random delays (jitter) into the timeout.
>> > As per Gwen's point, does it make sense to put an upper bound on the
>> > jitter, though?  If someone sets reconnect.backoff.max to 5 minutes,
>> > they probably would be a little surprised to find it doing three retries
>> > after 100 ms in a row (as it could under the current scheme.)  Maybe a
>> > maximum jitter configuration would help address that, and make the
>> > behavior a little more intuitive.
>> >
>> > best,
>> > Colin
>> >
>> >
>> > On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote:
>> > > This is a great suggestion. I like how we just do it by default
>> instead
>> > > of
>> > > making it a choice users need to figure out.
>> > > Avoiding connection storms is great.
>> > >
>> > > One concern. If I understand the formula for effective maximum backoff
>> > > correctly, then with default maximum of 1000ms and default backoff of
>> > > 100ms, the effective maximum backoff will be 450ms rather than 1000ms.
>> > > This
>> > > isn't exactly intuitive.
>> > > I'm wondering if it makes more sense to allow "one last doubling"
>> which
>> > > may
>> > > bring us slightly over the maximum, but much closer to it. I.e. have
>> the
>> > > effective maximum be in [max.backoff - backoff, max.backoff + backoff]
>> > > range rather than half that. Does that make sense?
>> > >
>> > > Gwen
>> > >
>> > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk>
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > Dana Powers posted a PR a while back for exponential backoff for
>> broker
>> > > > reconnect attempts. Because it adds a config, a KIP is required and
>> > Dana
>> > > > seems to be busy so I posted it:
>> > > >
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts
>> > > >
>> > > > Please take a look. Your feedback is appreciated.
>> > > >
>> > > > Thanks,
>> > > > Ismael
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > *Gwen Shapira*
>> > > Product Manager | Confluent
>> > > 650.450.2760 | @gwenshap
>> > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
>> > > <http://www.confluent.io/blog>
>> >
>>
>

Re: [DISCUSS] KIP-144: Exponential backoff for broker reconnect attempts

Reply via email to