Hi Marc, On Tue, Jan 23, 2018 at 05:07:26PM +0100, Marc Fournier wrote: > > Note that I'm bothered with the call to protocol_enable_all() as > > well in this function since it will start the proxies multiple times > > in a possibly unsafe mode. That may explain a lot of things suddenly! > > > > I think the attached patch works around it, but I'd like your > > confirmation before cleaning it up. > > I applied this single patch on top of 1.8.3, and indeed this seems much > better ! The servers are eventually in an UP state after reloading.
Thanks very much for your useful feedback. > "grep -c EBADF" on yesterday's strace logfile returned around 900 matches. Definitely not good :-/ > 18 matches with the patchset from your other email, and only 1 with this > patch: > > write(-1, "1\n", 2) = -1 EBADF (Bad file descriptor) Seems to me we've left an "fddebug()" call somewhere, though I can't find it. > There still are some "Socket error" in the logs though, and I noticed some > servers (not all) go DOWN for a couple of seconds just after reload because > of this, before coming up again 4 seconds later (I use "inter 2000 rise 2"). > But at least the system recovers properly from this situation and seems to > stay stable afterwards. Then it could be related to something else, I prefer that we analyse issues one at a time. > I can't see any obvious close() in the strace log which would be causing > trouble (un)fortunately. I'll send the whole log to you privately, so you > can have a look. Thank you, I've received it, I'll check. I'm going to polish up the patch you tested. Now I see how to reproduce the random behaviour (it's always easier to try to break a program when you have the patch to fix it). Thanks! Willy

