Hi, On Fri, Jun 10, 2016 at 12:55 PM, Gert Doering <g...@greenie.muc.de> wrote:
> On Fri, Jun 10, 2016 at 12:43:20PM -0400, Selva Nair wrote: > > > @Selva, Arne: can we make the reconnect logic somewhat smarter overall, > > > like > > > "if reconnecting to the same host, wait 30 seconds instead of 5"? > > > > This is possible, but the case for progressively increasing the restart > > pause is not very strong. Can we get some feedback from people who serve > > 1000's of users? > > I would generally consider it polite behaviour... (also, it might save > the client from filled-up disks...) - didn't think of exponential back-off > yet, but that would certainly be another approach to consider. > > I have no idea how complicated it would be to implement, though. Filling up the client log with repeated retries is a concern, indeed. Although a proper implementation needs the failed reconnect count per ip/port combination which we do not currently keep track of, I think a heuristic count may be good enough. One could use something like this: n = c->options.unsuccessful_attempts m = c->options.connection_list->len rc = n/m (a rough measure of retries per remote) timeout = the_default_timeout << MIN(rc, 10) (exponential up to ~ 5000 seconds). throw a SIGHUP if rc exceeds some value (resets n and starts over) The retry count will be over-stated for situations like one remote name that resolves to many IPs, but avoiding that requires more work.. Any thoughts? Selva