Thanks for the feedback Gwen and Colin. I agree that the original formula
was not intuitive. I updated it to include a max jitter as was suggested. I
also updated the config name to include `ms`:

https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=69408222&selectedPageVersions=3&selectedPageVersions=1

If there are no other concerns, I will start the vote tomorrow.

Ismael

On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote:

> Thanks for the KIP, Ismael & Dana!  This could be pretty important for
> avoiding congestion collapse when there are a lot of clients.
>
> It seems like a good idea to keep the "ms" suffix, like we have with
> "reconnect.backoff.ms".  So maybe we should use
> "reconnect.backoff.max.ms"?  In general unitless timeouts can be the
> source of a lot of confusion (is it seconds, milliseconds, etc.?)
>
> It's good that the KIP inject random delays (jitter) into the timeout.
> As per Gwen's point, does it make sense to put an upper bound on the
> jitter, though?  If someone sets reconnect.backoff.max to 5 minutes,
> they probably would be a little surprised to find it doing three retries
> after 100 ms in a row (as it could under the current scheme.)  Maybe a
> maximum jitter configuration would help address that, and make the
> behavior a little more intuitive.
>
> best,
> Colin
>
>
> On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote:
> > This is a great suggestion. I like how we just do it by default instead
> > of
> > making it a choice users need to figure out.
> > Avoiding connection storms is great.
> >
> > One concern. If I understand the formula for effective maximum backoff
> > correctly, then with default maximum of 1000ms and default backoff of
> > 100ms, the effective maximum backoff will be 450ms rather than 1000ms.
> > This
> > isn't exactly intuitive.
> > I'm wondering if it makes more sense to allow "one last doubling" which
> > may
> > bring us slightly over the maximum, but much closer to it. I.e. have the
> > effective maximum be in [max.backoff - backoff, max.backoff + backoff]
> > range rather than half that. Does that make sense?
> >
> > Gwen
> >
> > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi all,
> > >
> > > Dana Powers posted a PR a while back for exponential backoff for broker
> > > reconnect attempts. Because it adds a config, a KIP is required and
> Dana
> > > seems to be busy so I posted it:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts
> > >
> > > Please take a look. Your feedback is appreciated.
> > >
> > > Thanks,
> > > Ismael
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > <http://www.confluent.io/blog>
>

Reply via email to