Hi,

On Fri, Jun 10, 2016 at 12:55 PM, Gert Doering <g...@greenie.muc.de> wrote:

> On Fri, Jun 10, 2016 at 12:43:20PM -0400, Selva Nair wrote:
> > > @Selva, Arne: can we make the reconnect logic somewhat smarter overall,
> > > like
> > > "if reconnecting to the same host, wait 30 seconds instead of 5"?
> >
> > This is possible, but the case for progressively increasing the restart
> > pause is not very strong. Can we get some feedback from people who serve
> > 1000's of users?
>
> I would generally consider it polite behaviour... (also, it might save
> the client from filled-up disks...) - didn't think of exponential back-off
> yet, but that would certainly be another approach to consider.
>
> I have no idea how complicated it would be to implement, though.


Filling up the client log with repeated retries is a concern, indeed.

Although a proper implementation needs the failed reconnect count per
ip/port combination which we do not currently keep track of, I think a
heuristic count may be good enough. One could use something like this:

n = c->options.unsuccessful_attempts
m = c->options.connection_list->len
rc = n/m   (a rough measure of retries per remote)
timeout = the_default_timeout << MIN(rc, 10)  (exponential up to ~ 5000
seconds).
throw a SIGHUP if rc exceeds some value (resets n and starts over)

The retry count will be over-stated for situations like one remote name
that resolves to many IPs, but avoiding that requires more work.. Any
thoughts?

Selva

Reply via email to