ID:               43610
 Comment by:       olafvdspek at gmail dot com
 Reported By:      oliver at realtsp dot com
 Status:           Open
 Bug Type:         CGI related
 Operating System: FreeBSD 6.2
 PHP Version:      5.2.5
 New Comment:

> Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

I'm no PHP developer and haven't looked at the code, but my guess:
A PHP process has C children, each being able to handle one connection.
When that connection is closed, it'll do an accept() to handle a new
connection.
When a web server opens more than C connections, those will not be
accepted until an existing connection is closed, which may take a long
time.
So a web server should never open more than C connections to one PHP
process.


Previous Comments:
------------------------------------------------------------------------

[2007-12-17 13:05:41] oliver at realtsp dot com

Actually......

It turns out that the php parent is not dead at all. Even with stable
5.2.5 (rather than 5.2-latest) if you setup the fastcgi server to be
started separately from lighty ie with lighty config like this:

fastcgi.server             = ( ".php" =>
                               ( "localhost" =>
                                 (
                                   "socket" => "/tmp/php-fastcgi.sock"
                                 )
                               )
                            )

and the use spawn_fcgi to start the php fcgi server manually. Then all
behaves as expected. ie you get some (not all!!) 500s while the overload
condition exists and when the load drops away you get all normal 200
responses again. ie elastic/tolerant performance as hoped for.

After some investigation into the the lighty source it turns out that
lighty is confused by the fact that PHP just fails to respond (ie
timeout) rather than returning FCGI_OVERLOADED. refer to this:

http://bugs.php.net/bug.php?id=39809

where dimitry said:

"PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection. The only way to detect this
situation - use connection timeout."

lighty however is sticking to the fastcgi spec and expecting the php
parent to be in shutdown mode (ie its PID to dissappear) when it does
not respond (after which it would then respawn a new parent). But
because the PHP parent is just busy and not actually shutting down, the
PID never dissappears and lighty gets stuck in a loop.

I have posted a workaround involving starting PHP separately here:

http://trac.lighttpd.net/trac/ticket/1488

which also proposes a "patch" to deal with PHP's non-standard behaviour
regarding FCGI_OVERLOADED.

However, the fundamental problem remains: It is very difficult for a
FASTCGI client to determine what is going on and therefore what to do
when php just times out on connections rather than returning the correct
FCGI_OVERLOADED response.

I did not understand dmitry's original reason for this: "PHP cannot
return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection."

Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

Thanks

Oliver

------------------------------------------------------------------------

[2007-12-17 10:44:55] oliver at realtsp dot com

We have tried with  

  http://snaps.php.net/php5.2-latest.tar.gz

Result is unchanged. 

NOTE that the php workers and parent processes are still showing on ps
after the crash (same as before the crash). But lightly cannot get a
sensible response from them.

[EMAIL PROTECTED] /usr/ports/lang/php5]# pstree  
...
 |-+- 25262 www /usr/local/sbin/lighttpd -f
/usr/local/etc/lighttpd.conf
 | \-+= 25263 www /usr/local/bin/php-cgi
 |   |--- 25264 www /usr/local/bin/php-cgi
 |   |--- 25265 www /usr/local/bin/php-cgi
 |   |--- 25266 www /usr/local/bin/php-cgi
 |   |--- 25267 www /usr/local/bin/php-cgi
 |   |--- 25268 www /usr/local/bin/php-cgi
 |   |--- 25269 www /usr/local/bin/php-cgi
 |   |--- 25270 www /usr/local/bin/php-cgi
 |   |--- 25271 www /usr/local/bin/php-cgi
 |   |--- 25272 www /usr/local/bin/php-cgi
 |   |--- 25273 www /usr/local/bin/php-cgi
 |   |--- 25274 www /usr/local/bin/php-cgi
 |   |--- 25275 www /usr/local/bin/php-cgi
 |   |--- 25276 www /usr/local/bin/php-cgi
 |   |--- 25277 www /usr/local/bin/php-cgi
 |   |--- 25278 www /usr/local/bin/php-cgi
 |   \--- 25279 www /usr/local/bin/php-cgi
....

------------------------------------------------------------------------

[2007-12-17 09:17:30] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.2-win32-installer-latest.msi



------------------------------------------------------------------------

[2007-12-16 21:55:00] oliver at realtsp dot com

Description:
------------
Version information below.

When I load the server with siege, once the fastcgi-php parent process
reaches a load of ~200 concurrent requests the process appears to crash
and refuses to accept further connections, even after the load is
removed again. Only way to recover is to restart lighttpd and thereby
the fastcgi-php server (and its children).

Clearly a load of 200+ is probably overloaded not sustainable. However
a non-recoverable crash means that even when a temporary load drops away
(produced by some aggressive robot on our production setup for example)
the server remains unusable and returns 500 responses.

php version as below, but patched with this:

http://cvs.php.net/viewvc.cgi/php-src/main/SAPI.c?r1=1.202.2.7.2.15&r2=1.202.2.7.2.16&pathrev=PHP_5_2&diff_format=u

because of this bug:

http://bugs.php.net/bug.php?id=43295

That patch removes the errors on

/root/php-5.2.5/main/SAPI.c(445)

but the "overload crash remains".


[EMAIL PROTECTED] /usr/ports/lang/php5]# lighttpd -v
lighttpd-1.4.18 (ssl) - a light and fast webserver
Build-Date: Dec  5 2007 18:23:49

fastcgi.server             = ( ".php" =>
                               ( "localhost" =>
                                 (
                                   "socket" =>
"/var/run/lighttpd/php-fastcgi.socket",
                                   "bin-path" =>
"/usr/local/bin/php-cgi",
                                   "max-procs" => 1,
                                   "bin-environment" => (
                                     "PHP_FCGI_CHILDREN" => "16",
                                     "PHP_FCGI_MAX_REQUESTS" => "500"
),
                                   "broken-scriptfilename" => "enable"
                                 )
                               )
                            )

[EMAIL PROTECTED] /usr/ports/lang/php5]# php-cgi -v
PHP 5.2.5 (cgi-fcgi) (built: Dec 16 2007 20:47:09) (DEBUG)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies


[EMAIL PROTECTED] /usr/ports/lang/php5]# php-cgi -m
[PHP Modules]
cgi-fcgi
date
libxml
Reflection
standard

[Zend Modules]

NOTE: no opcode cache or third party extensions

php.ini parsed is "none" (ie all defaults)


FreeBSD muriwai.realtsp.com 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan
12 08:43:30 UTC 2007    
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP  amd64



Reproduce code:
---------------
a trivial script will do:

<?php

sleep(1);
phpinfo();


with .siegerc

#
# Default number of simulated  concurrent users
# ex: concurrent = 25
#
concurrent = 250



Expected result:
----------------
php fastcgi parent process (and children) remaining stable. maybe an
"overloaded" or even temporary 500 type response would be acceptable.
but crashing and then being completely unreponsive is a bit of an
issue.



Actual result:
--------------
lighttpd.error.log reports this

2007-12-16 21:19:22: (mod_fastcgi.c.1731) connect failed: Connection
refused on unix:/var/run/lighttpd/php-fastcgi.socket-87058-0
2007-12-16 21:19:22: (mod_fastcgi.c.2885) backend died; we'll disable
it for 5 seconds and send the request to another backend instead:
reconnects: 0 load: 210
2007-12-16 21:19:22: (mod_fastcgi.c.3496) all handlers for  /index.php
on .php are down.

i haven't managed a backtrace yet, because this is not that easy with a
fastcgi process, but i am working on it.

NOTE: despite what lighty says above..it does not restart the php
parent process. I am not sure why, but this is a separate issue i
believe.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=43610&edit=1

Reply via email to