s/back-office/backoff/ Final note: although the goal here is not to resolve contention (as in the aws article), I think we do still want a relatively smooth rate of reconnects across all clients to avoid storm spikes. Full Jitter does that. I expect that narrower jitter bands will lead to more clumping of reconnects, but that's maybe ok.
Another idea would be to make jitter configurable. Full jitter would be 100%. No jitter 0%. Equal Jitter 50%. Etc. On May 8, 2017 5:28 PM, "Dana Powers" <dana.pow...@gmail.com> wrote: > For some discussion of jitter and exponential back-office, I found this > article useful: > > https://www.awsarchitectureblog.com/2015/03/backoff.html > > My initial POC used the "Full Jitter" approach described therein. Equal > Jitter is good too, and may perform a little better. It is random > distribution between 50% and 100% of calculated backoff. > > Dana > > On May 4, 2017 8:50 PM, "Ismael Juma" <ism...@juma.me.uk> wrote: > >> Thanks for the feedback Gwen and Colin. I agree that the original formula >> was not intuitive. I updated it to include a max jitter as was suggested. >> I >> also updated the config name to include `ms`: >> >> https://cwiki.apache.org/confluence/pages/diffpagesbyversion >> .action?pageId=69408222&selectedPageVersions=3&selectedPageVersions=1 >> >> If there are no other concerns, I will start the vote tomorrow. >> >> Ismael >> >> On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote: >> >> > Thanks for the KIP, Ismael & Dana! This could be pretty important for >> > avoiding congestion collapse when there are a lot of clients. >> > >> > It seems like a good idea to keep the "ms" suffix, like we have with >> > "reconnect.backoff.ms". So maybe we should use >> > "reconnect.backoff.max.ms"? In general unitless timeouts can be the >> > source of a lot of confusion (is it seconds, milliseconds, etc.?) >> > >> > It's good that the KIP inject random delays (jitter) into the timeout. >> > As per Gwen's point, does it make sense to put an upper bound on the >> > jitter, though? If someone sets reconnect.backoff.max to 5 minutes, >> > they probably would be a little surprised to find it doing three retries >> > after 100 ms in a row (as it could under the current scheme.) Maybe a >> > maximum jitter configuration would help address that, and make the >> > behavior a little more intuitive. >> > >> > best, >> > Colin >> > >> > >> > On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote: >> > > This is a great suggestion. I like how we just do it by default >> instead >> > > of >> > > making it a choice users need to figure out. >> > > Avoiding connection storms is great. >> > > >> > > One concern. If I understand the formula for effective maximum backoff >> > > correctly, then with default maximum of 1000ms and default backoff of >> > > 100ms, the effective maximum backoff will be 450ms rather than 1000ms. >> > > This >> > > isn't exactly intuitive. >> > > I'm wondering if it makes more sense to allow "one last doubling" >> which >> > > may >> > > bring us slightly over the maximum, but much closer to it. I.e. have >> the >> > > effective maximum be in [max.backoff - backoff, max.backoff + backoff] >> > > range rather than half that. Does that make sense? >> > > >> > > Gwen >> > > >> > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk> >> wrote: >> > > >> > > > Hi all, >> > > > >> > > > Dana Powers posted a PR a while back for exponential backoff for >> broker >> > > > reconnect attempts. Because it adds a config, a KIP is required and >> > Dana >> > > > seems to be busy so I posted it: >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts >> > > > >> > > > Please take a look. Your feedback is appreciated. >> > > > >> > > > Thanks, >> > > > Ismael >> > > > >> > > >> > > >> > > >> > > -- >> > > *Gwen Shapira* >> > > Product Manager | Confluent >> > > 650.450.2760 | @gwenshap >> > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog >> > > <http://www.confluent.io/blog> >> > >> >