On Mon, Aug 22, 2011 at 8:10 PM, Jonathan Swartz <swa...@pobox.com> wrote:

> We use Apache/mod_perl 2 and occasionally get a child httpd process that
> spins out of control, either consuming ever-increasing amounts of memory or
> max cpu. Usually due to an infinite loop or other bug in a specific part of
> the site - this sort of thing happens.
>
> I would like to monitor for such httpd children every second or so, and
> when finding one, send it a USR2 signal so it can dump its current Perl
> stack to our error logs.
>
A few ideas:

   - If your requests are typically short and the memory allocation uses
   enough CPU time, you could set a soft limit for CPU time then catch
   $SIG{XCPU} (you would also need to limit how many requests your child
   processes handle).  It worked for me in a quick test.
   - If the memory usage is significant, as a quick check you could look at
   the total free memory available on the system, and only if it falls below a
   threshold do a more complex check with Proc::ProcessTable.
   - If the runaway process causes the load average to go up, you could look
   at the lod average, and only if it rises above a threshold do a more complex
   check with Proc::ProcessTable.
   - If your requests are typically short, you could create a small watchdog
   server; a request would register its PID with the watchdog server, then
   unregister when it finishes.  If the watchdog sees a request register that
   does not complete within some time limit, it could send SIGUSR2.  I have
   used a solution like this in the past, and it is effective, if a bit
   cumbersome.
   - Apache::Scoreboard<http://search.cpan.org/~mjh/Apache-Scoreboard-2.09.2/>
can
   get you the PIDs of just the Apache processes, and some basic state
   information.  You might be able to use this to make your process table scan
   more efficient.  Maybe you could write a URL handler to do your checking
   and signaling using the scoreboard from within Apache, then load the URL
   periodically to trigger the test.

Hope this is helpful,

-----Scott.

Reply via email to