ID: 43610 Comment by: olafvdspek at gmail dot com Reported By: oliver at realtsp dot com Status: Open Bug Type: CGI related Operating System: FreeBSD 6.2 PHP Version: 5.2.5 New Comment:
> Could you explain or perhaps review PHP's behaviour under overloaded conditions. I'm no PHP developer and haven't looked at the code, but my guess: A PHP process has C children, each being able to handle one connection. When that connection is closed, it'll do an accept() to handle a new connection. When a web server opens more than C connections, those will not be accepted until an existing connection is closed, which may take a long time. So a web server should never open more than C connections to one PHP process. Previous Comments: ------------------------------------------------------------------------ [2007-12-17 13:05:41] oliver at realtsp dot com Actually...... It turns out that the php parent is not dead at all. Even with stable 5.2.5 (rather than 5.2-latest) if you setup the fastcgi server to be started separately from lighty ie with lighty config like this: fastcgi.server = ( ".php" => ( "localhost" => ( "socket" => "/tmp/php-fastcgi.sock" ) ) ) and the use spawn_fcgi to start the php fcgi server manually. Then all behaves as expected. ie you get some (not all!!) 500s while the overload condition exists and when the load drops away you get all normal 200 responses again. ie elastic/tolerant performance as hoped for. After some investigation into the the lighty source it turns out that lighty is confused by the fact that PHP just fails to respond (ie timeout) rather than returning FCGI_OVERLOADED. refer to this: http://bugs.php.net/bug.php?id=39809 where dimitry said: "PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy and nobody accepts new connection. The only way to detect this situation - use connection timeout." lighty however is sticking to the fastcgi spec and expecting the php parent to be in shutdown mode (ie its PID to dissappear) when it does not respond (after which it would then respawn a new parent). But because the PHP parent is just busy and not actually shutting down, the PID never dissappears and lighty gets stuck in a loop. I have posted a workaround involving starting PHP separately here: http://trac.lighttpd.net/trac/ticket/1488 which also proposes a "patch" to deal with PHP's non-standard behaviour regarding FCGI_OVERLOADED. However, the fundamental problem remains: It is very difficult for a FASTCGI client to determine what is going on and therefore what to do when php just times out on connections rather than returning the correct FCGI_OVERLOADED response. I did not understand dmitry's original reason for this: "PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy and nobody accepts new connection." Could you explain or perhaps review PHP's behaviour under overloaded conditions. Thanks Oliver ------------------------------------------------------------------------ [2007-12-17 10:44:55] oliver at realtsp dot com We have tried with http://snaps.php.net/php5.2-latest.tar.gz Result is unchanged. NOTE that the php workers and parent processes are still showing on ps after the crash (same as before the crash). But lightly cannot get a sensible response from them. [EMAIL PROTECTED] /usr/ports/lang/php5]# pstree ... |-+- 25262 www /usr/local/sbin/lighttpd -f /usr/local/etc/lighttpd.conf | \-+= 25263 www /usr/local/bin/php-cgi | |--- 25264 www /usr/local/bin/php-cgi | |--- 25265 www /usr/local/bin/php-cgi | |--- 25266 www /usr/local/bin/php-cgi | |--- 25267 www /usr/local/bin/php-cgi | |--- 25268 www /usr/local/bin/php-cgi | |--- 25269 www /usr/local/bin/php-cgi | |--- 25270 www /usr/local/bin/php-cgi | |--- 25271 www /usr/local/bin/php-cgi | |--- 25272 www /usr/local/bin/php-cgi | |--- 25273 www /usr/local/bin/php-cgi | |--- 25274 www /usr/local/bin/php-cgi | |--- 25275 www /usr/local/bin/php-cgi | |--- 25276 www /usr/local/bin/php-cgi | |--- 25277 www /usr/local/bin/php-cgi | |--- 25278 www /usr/local/bin/php-cgi | \--- 25279 www /usr/local/bin/php-cgi .... ------------------------------------------------------------------------ [2007-12-17 09:17:30] [EMAIL PROTECTED] Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows (zip): http://snaps.php.net/win32/php5.2-win32-latest.zip For Windows (installer): http://snaps.php.net/win32/php5.2-win32-installer-latest.msi ------------------------------------------------------------------------ [2007-12-16 21:55:00] oliver at realtsp dot com Description: ------------ Version information below. When I load the server with siege, once the fastcgi-php parent process reaches a load of ~200 concurrent requests the process appears to crash and refuses to accept further connections, even after the load is removed again. Only way to recover is to restart lighttpd and thereby the fastcgi-php server (and its children). Clearly a load of 200+ is probably overloaded not sustainable. However a non-recoverable crash means that even when a temporary load drops away (produced by some aggressive robot on our production setup for example) the server remains unusable and returns 500 responses. php version as below, but patched with this: http://cvs.php.net/viewvc.cgi/php-src/main/SAPI.c?r1=1.202.2.7.2.15&r2=1.202.2.7.2.16&pathrev=PHP_5_2&diff_format=u because of this bug: http://bugs.php.net/bug.php?id=43295 That patch removes the errors on /root/php-5.2.5/main/SAPI.c(445) but the "overload crash remains". [EMAIL PROTECTED] /usr/ports/lang/php5]# lighttpd -v lighttpd-1.4.18 (ssl) - a light and fast webserver Build-Date: Dec 5 2007 18:23:49 fastcgi.server = ( ".php" => ( "localhost" => ( "socket" => "/var/run/lighttpd/php-fastcgi.socket", "bin-path" => "/usr/local/bin/php-cgi", "max-procs" => 1, "bin-environment" => ( "PHP_FCGI_CHILDREN" => "16", "PHP_FCGI_MAX_REQUESTS" => "500" ), "broken-scriptfilename" => "enable" ) ) ) [EMAIL PROTECTED] /usr/ports/lang/php5]# php-cgi -v PHP 5.2.5 (cgi-fcgi) (built: Dec 16 2007 20:47:09) (DEBUG) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies [EMAIL PROTECTED] /usr/ports/lang/php5]# php-cgi -m [PHP Modules] cgi-fcgi date libxml Reflection standard [Zend Modules] NOTE: no opcode cache or third party extensions php.ini parsed is "none" (ie all defaults) FreeBSD muriwai.realtsp.com 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 08:43:30 UTC 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP amd64 Reproduce code: --------------- a trivial script will do: <?php sleep(1); phpinfo(); with .siegerc # # Default number of simulated concurrent users # ex: concurrent = 25 # concurrent = 250 Expected result: ---------------- php fastcgi parent process (and children) remaining stable. maybe an "overloaded" or even temporary 500 type response would be acceptable. but crashing and then being completely unreponsive is a bit of an issue. Actual result: -------------- lighttpd.error.log reports this 2007-12-16 21:19:22: (mod_fastcgi.c.1731) connect failed: Connection refused on unix:/var/run/lighttpd/php-fastcgi.socket-87058-0 2007-12-16 21:19:22: (mod_fastcgi.c.2885) backend died; we'll disable it for 5 seconds and send the request to another backend instead: reconnects: 0 load: 210 2007-12-16 21:19:22: (mod_fastcgi.c.3496) all handlers for /index.php on .php are down. i haven't managed a backtrace yet, because this is not that easy with a fastcgi process, but i am working on it. NOTE: despite what lighty says above..it does not restart the php parent process. I am not sure why, but this is a separate issue i believe. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=43610&edit=1