Thanks for the feedback Gwen and Colin. I agree that the original formula was not intuitive. I updated it to include a max jitter as was suggested. I also updated the config name to include `ms`:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=69408222&selectedPageVersions=3&selectedPageVersions=1 If there are no other concerns, I will start the vote tomorrow. Ismael On Mon, May 1, 2017 at 6:18 PM, Colin McCabe <cmcc...@apache.org> wrote: > Thanks for the KIP, Ismael & Dana! This could be pretty important for > avoiding congestion collapse when there are a lot of clients. > > It seems like a good idea to keep the "ms" suffix, like we have with > "reconnect.backoff.ms". So maybe we should use > "reconnect.backoff.max.ms"? In general unitless timeouts can be the > source of a lot of confusion (is it seconds, milliseconds, etc.?) > > It's good that the KIP inject random delays (jitter) into the timeout. > As per Gwen's point, does it make sense to put an upper bound on the > jitter, though? If someone sets reconnect.backoff.max to 5 minutes, > they probably would be a little surprised to find it doing three retries > after 100 ms in a row (as it could under the current scheme.) Maybe a > maximum jitter configuration would help address that, and make the > behavior a little more intuitive. > > best, > Colin > > > On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote: > > This is a great suggestion. I like how we just do it by default instead > > of > > making it a choice users need to figure out. > > Avoiding connection storms is great. > > > > One concern. If I understand the formula for effective maximum backoff > > correctly, then with default maximum of 1000ms and default backoff of > > 100ms, the effective maximum backoff will be 450ms rather than 1000ms. > > This > > isn't exactly intuitive. > > I'm wondering if it makes more sense to allow "one last doubling" which > > may > > bring us slightly over the maximum, but much closer to it. I.e. have the > > effective maximum be in [max.backoff - backoff, max.backoff + backoff] > > range rather than half that. Does that make sense? > > > > Gwen > > > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Hi all, > > > > > > Dana Powers posted a PR a while back for exponential backoff for broker > > > reconnect attempts. Because it adds a config, a KIP is required and > Dana > > > seems to be busy so I posted it: > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts > > > > > > Please take a look. Your feedback is appreciated. > > > > > > Thanks, > > > Ismael > > > > > > > > > > > -- > > *Gwen Shapira* > > Product Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > > <http://www.confluent.io/blog> >