Thanks for the KIP, Ismael & Dana! This could be pretty important for avoiding congestion collapse when there are a lot of clients.
It seems like a good idea to keep the "ms" suffix, like we have with "reconnect.backoff.ms". So maybe we should use "reconnect.backoff.max.ms"? In general unitless timeouts can be the source of a lot of confusion (is it seconds, milliseconds, etc.?) It's good that the KIP inject random delays (jitter) into the timeout. As per Gwen's point, does it make sense to put an upper bound on the jitter, though? If someone sets reconnect.backoff.max to 5 minutes, they probably would be a little surprised to find it doing three retries after 100 ms in a row (as it could under the current scheme.) Maybe a maximum jitter configuration would help address that, and make the behavior a little more intuitive. best, Colin On Thu, Apr 27, 2017, at 09:39, Gwen Shapira wrote: > This is a great suggestion. I like how we just do it by default instead > of > making it a choice users need to figure out. > Avoiding connection storms is great. > > One concern. If I understand the formula for effective maximum backoff > correctly, then with default maximum of 1000ms and default backoff of > 100ms, the effective maximum backoff will be 450ms rather than 1000ms. > This > isn't exactly intuitive. > I'm wondering if it makes more sense to allow "one last doubling" which > may > bring us slightly over the maximum, but much closer to it. I.e. have the > effective maximum be in [max.backoff - backoff, max.backoff + backoff] > range rather than half that. Does that make sense? > > Gwen > > On Thu, Apr 27, 2017 at 9:06 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > Hi all, > > > > Dana Powers posted a PR a while back for exponential backoff for broker > > reconnect attempts. Because it adds a config, a KIP is required and Dana > > seems to be busy so I posted it: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 144%3A+Exponential+backoff+for+broker+reconnect+attempts > > > > Please take a look. Your feedback is appreciated. > > > > Thanks, > > Ismael > > > > > > -- > *Gwen Shapira* > Product Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > <http://www.confluent.io/blog>