Re: httpd stops accepting connections after a few hours on current

Jack Burton Mon, 13 Jul 2015 06:25:24 -0700

On Mon, 2015-07-13 at 11:02 +0200, Tor Houghton wrote: 
> On Sun, Jul 12, 2015 at 07:56:37PM +0930, Jack Burton wrote:
> > 
> > It is possible I simply failed to provision sufficient capacity --
> > which could easily be fixed by adding a login class for www with a
> > higher limit on open fds -- but I fear that might just be hiding the
> > problem rather than addressing it: exhausting a 512 fd limit with with
> > peak load of only 48 req/sec (and average load of 2 req/sec) just
> > doesn't feel right (especially when that peak load is all 303s
> > generated internally by httpd, which each take only a tiny fraction of
> > a second to process).
> 
> I don't pretend to know httpd (at all), but I'm wondering, what should
> fstat(1) say, over time, for the httpd processes?


Thanks Tor -- that was exactly the clue I needed to isolate the
problem.

Wrote a short script to parse the output of running fstat -p for each
running httpd (we're running with prefork 8, so I didn't fancy doing it
by hand), and report the timestamp of the last request in the relevant
access log of each client IP with an open socket (or 'missing' if no
entry in the current access log).

Ran it roughly 4 hours after the last log rotation and found only 34
matches out of 73 open sockets. We don't run anything here that would
take anywhere near 4 hours to return a response, so the 39 that didn't
match entries in any of the current access logs were clearly where I
needed to look.

All 39 related to "admin" -- the one HTTPS server that I hadn't spent
any time looking into (since it accounts for only 0.02% of httpd's load
here, it didn't occur to me that that tiny little thing could be
bringing httpd to its knees ... famous last words).

admin talks to a custom FastCGI daemon, which is most likely the culprit
-- I'll debug it tomorrow.

"portal" (the other HTTPS server) also talks to a (different) custom
FastCGI daemon, but carries orders of magnitude more traffic and didn't
have any stale sockets -- so clearly our problem is at the other end of
admin's FastCGI socket (not with httpd itself). Sorry for the noise.

Ted -- similarly, you may want to look into whatever is at the other end
of your "server1"'s FastCGI socket. If your issue is the same as ours,
that's likely where you'll find the cause.

Re: httpd stops accepting connections after a few hours on current

Reply via email to