Hi Marc,

On Tue, Jan 23, 2018 at 05:07:26PM +0100, Marc Fournier wrote:
> > Note that I'm bothered with the call to protocol_enable_all() as
> > well in this function since it will start the proxies multiple times
> > in a possibly unsafe mode. That may explain a lot of things suddenly!
> > 
> > I think the attached patch works around it, but I'd like your
> > confirmation before cleaning it up.
> 
> I applied this single patch on top of 1.8.3, and indeed this seems much
> better ! The servers are eventually in an UP state after reloading.

Thanks very much for your useful feedback.

> "grep -c EBADF" on yesterday's strace logfile returned around 900 matches.

Definitely not good :-/

> 18 matches with the patchset from your other email, and only 1 with this
> patch:
> 
> write(-1, "1\n", 2) = -1 EBADF (Bad file descriptor)

Seems to me we've left an "fddebug()" call somewhere, though I can't
find it.

> There still are some "Socket error" in the logs though, and I noticed some
> servers (not all) go DOWN for a couple of seconds just after reload because
> of this, before coming up again 4 seconds later (I use "inter 2000 rise 2").
> But at least the system recovers properly from this situation and seems to
> stay stable afterwards.

Then it could be related to something else, I prefer that we analyse
issues one at a time.

> I can't see any obvious close() in the strace log which would be causing
> trouble (un)fortunately. I'll send the whole log to you privately, so you
> can have a look.

Thank you, I've received it, I'll check.

I'm going to polish up the patch you tested. Now I see how to reproduce
the random behaviour (it's always easier to try to break a program when
you have the patch to fix it).

Thanks!
Willy

Reply via email to