Greetings,

My Apache processes occasionally get stuck on one request and stop doing
anything useful. There's no progress on that request (as judged by
tracing ethernet traffic or system calls), yet Apache's TimeOut directive does not kick in.


This problem is apparently related to PHP, as it occurs only for requests served using mod_php. When serving plain files, the connection is properly aborted at the times prescribed by Apache's TimeOut directive.

I've tested this with two configurations:
1. Debian's apache 1.3.26-0woody3 (including Debian's patches)
   + stock PHP 4.3.0
2. Stock Apache 2.0.47 with preform MPM,
   + stock PHP 4.3.2, configured with "--with-apxs2" and with
     "--with-apxs2filter"
both on Debian GNU/Linux "woody" with Debian's kernel 2.4.18-8.

The hang can be reproduced by making an HTTP request and ceasing to
accept new packets in mid-transfer. For example, run
"wget -O - http://www.haayal.co.il/story_222";
and press Ctrl-S to pause it after a second or two (the full response is 1.3MB).


Here are the syscall traces (obtained via strace) of the Apache process handling the request.

For Apache 1.3.26-0woody3, everything looks normal until:
---------------------------
21:04:29.511392 writev(5, [{"<a name=\"5121\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 8537}], 2 <unfinished ...>
21:14:47.479291 <... writev resumed> ) = 4650
21:14:47.482244 writev(5, [{"\340 \344\356\354 *\341\345\372\353\372*
\344\371\367\341"..., 3906}], 1) = -1 ECONNRESET (Connection reset by peer)
---------------------------
Note the 10 minutes gap. Indeed, I manually closed the connection on the
client-side after ten minutes.


For Apache 2.0.47, everything's normal until:
---------------------------
21:59:43.515026 writev(9, [{"<a name=\"9542\"></a>", 19}, {"<table
width=\"610\" border=\"0\" ce"..., 2320}, {"<a name=\"5061\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 3429}, {"<a name=\"5092\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 4876}], 6) = 9354
21:59:43.515305 poll( <unfinished ...>
22:00:43.524236 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
22:00:43.524367 gettimeofday({1058468443, 526207}, NULL) = 0
22:00:43.526313 write(6, "[Thu Jul 17 22:00:43 2003] [info] (70007)The timeout specified has expired: core_output_filter: writing data to the network", 124) = 124
[[[ communication with local SQL server snipped ]]]
22:00:43.754018 brk(0x82fe000) = 0x82fe000
22:00:43.755570 writev(9, [{"<a name=\"5094\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 3685}, {"<a name=\"5120\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 5699}], 4) = -1 EAGAIN (Resource temporarily unavailable
)
22:00:43.755727 poll( <unfinished ...>
22:01:43.766199 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
22:01:43.766306 gettimeofday({1058468503, 766323}, NULL) = 0
22:01:43.766389 write(6, "[Thu Jul 17 22:01:43 2003] [info] (70007)The timeout specified has expired: core_output_filter: writing data to the network", 124) = 124
[[[ communication with local SQL server snipped ]]]
22:01:43.859266 writev(9, [{"<a name=\"5121\"></a>", 19}, {"<table width=\"610\" border=\"0\" ce"..., 8537}], 2) = -1 EAGAIN (Resource temporarily unavailable)
22:01:43.859416 poll( <unfinished ...>
22:02:43.868136 <... poll resumed> [{fd=9, events=POLLOUT}], 1, 60000) = 0
---------------------------
and so on, until I closed the connection from the client side after 14 minutes. Note that poll() returns after exactly 60 seconds, which happens to be my TimeOut value.


Tracing the TCP connection using Ethereal shows that once the client stops accepting packets (e.g., due to Ctrl-S on "wget -O -"), a pair of ACK packets is exchanged every exactly 120 seconds. Thus, the above behavior is inconsistent with the description of Apache's TimeOut directive, which is supposed to limit "The amount of time between ACKs on transmissions of TCP packets in responses" to 60 seconds.

The full straces show that in both cases, the parent Apache process never sends the SIGALRM it's supposed to send (also, they show that mod_php does *not* mask SIGALRM or change its handler). By contrast, for an identical test with a plain file rather than a PHP script, TimeOut kicks in, a SIGALRM is sent (by the parent Apache process) and the connection is closed:

---------------------------
21:17:54.797220 write(5, "MMYDATADUMMYDATADUMMYDATADUMMYDA"..., 8192 <unfinished ...>
21:18:55.082141 <... write resumed> ) = 2745
21:18:55.082167 --- SIGALRM (Alarm clock) ---
21:18:55.082480 time(NULL) = 1058465935
21:18:55.082631 write(15, "[Thu Jul 17 21:18:55 2003] [info] [client 192.117.108.16] send body timed out", 78) = 78
---------------------------
(that's with apache 1.3.26-0woody3).



For whatever reason, such hung connections occur rather frequently and sometimes persist for several hours or more. Thus my Apache is bleeding useful processes, and needs to be killed and restarted several times a day -- rather inconvenient. Help would be greatly appreciated.



Regards, Eran



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to