On Thu, 2015-07-09 at 11:59 +0200, Tor Houghton wrote: > On Wed, Jul 08, 2015 at 10:04:27PM -0500, Theodore Wynnychenko wrote: > > > > [snip] > > > > server https://server2.tldn.com, client 2067 (63 active), 10.0.28.254:60330 > > -> > > 10.0.28.130:443, buffer event error > > [..] > > server https://server2.tldn.com, client 2068 (63 active), 10.0.28.254:52350 > > -> > > 10.0.28.130:443, buffer event error > > I'm going to "me too" on this one (have not been until now, as I thought > perhaps it was due to my setup, and therefore off-topic).
Likewise, seeing the same behaviour here on 5.7-stable -- so the problem is not confined to -current. Fairly small & simple httpd setup here, httpd configured with 3 server stanzas: 2 HTTPS-only (both using FastCGI) plus one trivial HTTP-only (just a block return 303 pointing to one of the HTTPS servers). Quite a light load too (averaging 178k requests/day -- about 2/sec). Frequency of problem varies wildly -- sometimes occurs after only an hour or two since last httpd restart and at other times httpd will last for up to 4 days before it stops responding to requests. Variation in volume of requests appears to have no effect on frequency of recurrence either. On every occasion, httpd continues to respond correctly to signals (httpd restarts are always clean), just not to HTTP[S] requests. On at least one occasion, the http socket continued to respond correctly to requests, whilst the two https ones stopped responding. On other occasions, all 3 stopped responding at around the same time. When a socket stops responding, it still accepts requests but httpd neither logs (at least, when not in debug mode) nor responds to them (i.e. I can successfully open a TCP session to the listening socket and send it a request, but nothing comes back after the initial ACK). It hasn't happened here in a few days now so I don't have a log extract on hand to share (but can post one next time it happens). >From memory in the past we were seeing TLS accept fail errors in the logs, as reported by the original poster, but not at the time the sockets stopped responding (only well beforehand), so I'd also assumed that those were unrelated. Running tcpdump on both user-facing interfaces (and on pflog0 just to rule out the possibility of some error in our pf.conf) whilst httpd was not responding to requests on previous occasions revealed nothing new. Have tried watching debug output a couple of times before, but it rapidly gets quite unwieldy, even with our modest load (especially over a remote ssh session -- both uplinks at that site are nearing capacity), given the length of time it can take for the problem to manifest (on each occasion I gave up after a few hours without the problem occurring). Am now running httpd -dvvv with stdout/err redirected to a temporary log file (probably should have done that in the first place). We are already seeing (after less than a minute) entries in the debug logs similar to those reported by Theodore, for example: * On an HTTPS server (using FastCGI): server portal, client 305 (14 active), 192.168.137.161:52224 -> 192.168.137.1:443, buffer event error and * On the trivial HTTP server (using just a block return 303): server redir, client 132 (11 active), 192.168.137.100:61081 -> 192.168.137.1, buffer event timeout However, the original problem (httpd stops responding to requests) is *not* occurring at present. Will post debug log extract & httpd.conf next time the problem recurs (should be within the next few days).