Hi James, I do not have the skills to be able to work well in "depth" to the problem as you (rightly) suggest. What I know is that this problem started from the last system update (apache / php / mariadb, ...) from 23 September.
The only log info is: /var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): socket reached timeout in lingering-close state and I don't know if strace (the results of which I have just quoted) is useful.. Should I wait for further apache / php updates? *Fabio F. Gervasi* Engineering Consultant *Web www.centrometeo.com <http://www.centrometeo.com/>* *g+ plus.google.com/centrometeo <https://plus.google.com/105294368204645075621>* *Twitter twitter.com/centrometeo <http://twitter.com/centrometeo>* *Facebook facebook.com/centrometeo <http://www.facebook.com/centrometeo>* 2016-10-02 10:34 GMT+02:00 Dr James Smith <[email protected]>: > Things you can do are trying to track down if there is any pattern - are > these first request on a child, "nth" request on child etc... we add > additional variables to access logs which include things such as > PID/request no in PID, memory usage before and after etc. This allows us > sometimes to see patterns in errors which aren't normally easy to see.. > Saying that it may not help you here... > > Another thing you can do with ab is check to see if it is OK under lower > load? > > Finally I find "siege" better than "ab" when it comes to testing as it is > more configurable - you can send it a list of URLs - run test for > "n-seconds" rather than "n-requests" which is more useful if you don't know > how long things take to return! > > James > > > On 02/10/2016 09:02, Daniel wrote: > > I would bet on your method of parsing php and the scripts being ran, in > any case to be sure: > > Try the strace approach: > strace -o /tmp/outputfile -s 5000 httpd -X or strace -ff -F -s200 -o > /tmp/strace.out -p PID > > to try to find out what's really going on. > > 2016-09-30 20:19 GMT+02:00 Fabio F.Gervasi <[email protected]>: > >> Hi! >> >> I have additional information: >> >> /var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 >> 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): >> socket reached timeout in lingering-close state >> >> What do you think? >> >> >> >> 2016-09-30 14:47 GMT+02:00 Fabio F.Gervasi <[email protected]>: >> >>> Hi! >>> >>> Thank you for your reply. I tried the following tests. >>> >>> *1) Using a little text file:* >>> *# ab -k -c 100 -n 2000 localhost/test.txt* >>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ >>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>> >>> Benchmarking localhost (be patient) >>> Completed 200 requests >>> Completed 400 requests >>> Completed 600 requests >>> Completed 800 requests >>> Completed 1000 requests >>> Completed 1200 requests >>> Completed 1400 requests >>> Completed 1600 requests >>> Completed 1800 requests >>> Completed 2000 requests >>> Finished 2000 requests >>> >>> >>> Server Software: Apache/2.4.23 >>> Server Hostname: localhost >>> Server Port: 80 >>> >>> Document Path: /test.txt >>> Document Length: 52 bytes >>> >>> Concurrency Level: 100 >>> Time taken for tests: 2.521 seconds >>> Complete requests: 2000 >>> *Failed requests: 16* >>> (Connect: 0, Receive: 0, Length: 16, Exceptions: 0) >>> Keep-Alive requests: 1984 >>> Total transferred: 950336 bytes >>> HTML transferred: 103168 bytes >>> Requests per second: 793.45 [#/sec] (mean) >>> Time per request: 126.032 [ms] (mean) >>> Time per request: 1.260 [ms] (mean, across all concurrent requests) >>> Transfer rate: 368.18 [Kbytes/sec] received >>> >>> Connection Times (ms) >>> min mean[+/-sd] median max >>> Connect: 0 1 3.8 0 21 >>> Processing: 1 123 126.5 83 772 >>> Waiting: 0 123 126.7 83 772 >>> Total: 1 124 126.6 85 772 >>> >>> Percentage of the requests served within a certain time (ms) >>> 50% 85 >>> 66% 154 >>> 75% 201 >>> 80% 229 >>> 90% 303 >>> 95% 373 >>> 98% 452 >>> 99% 547 >>> 100% 772 (longest request) >>> >>> *I obtain 16 failed requests, but if I run more again I obtain a >>> different number.* >>> >>> *2) Using a big gif file:* >>> *# ab -k -c 100 -n 2000 localhost/it-vis-animation.gif* >>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ >>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>> >>> Benchmarking localhost (be patient) >>> Completed 200 requests >>> Completed 400 requests >>> Completed 600 requests >>> Completed 800 requests >>> Completed 1000 requests >>> Completed 1200 requests >>> Completed 1400 requests >>> Completed 1600 requests >>> Completed 1800 requests >>> Completed 2000 requests >>> Finished 2000 requests >>> >>> >>> Server Software: Apache/2.4.23 >>> Server Hostname: localhost >>> Server Port: 80 >>> >>> Document Path: /it-vis-animation.gif >>> Document Length: 8105309 bytes >>> >>> Concurrency Level: 100 >>> Time taken for tests: 26.294 seconds >>> Complete requests: 2000 >>> *Failed requests: 0* >>> Keep-Alive requests: 2000 >>> Total transferred: 16211484000 bytes >>> HTML transferred: 16210618000 bytes >>> Requests per second: 76.06 [#/sec] (mean) >>> Time per request: 1314.691 [ms] (mean) >>> Time per request: 13.147 [ms] (mean, across all concurrent >>> requests) >>> Transfer rate: 602100.83 [Kbytes/sec] received >>> >>> Connection Times (ms) >>> min mean[+/-sd] median max >>> Connect: 0 1 4.0 0 22 >>> Processing: 404 1309 80.0 1298 1635 >>> Waiting: 5 22 32.4 10 207 >>> Total: 420 1310 80.6 1298 1635 >>> >>> Percentage of the requests served within a certain time (ms) >>> 50% 1298 >>> 66% 1315 >>> 75% 1342 >>> 80% 1360 >>> 90% 1397 >>> 95% 1440 >>> 98% 1513 >>> 99% 1530 >>> 100% 1635 (longest request) >>> >>> *In this case I obtain always 0 failed!* >>> >>> *3) Using a little png file:* >>> *# ab -k -c 100 -n 2000 localhost/apple-touch-icon.png* >>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ >>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>> >>> Benchmarking localhost (be patient) >>> Completed 200 requests >>> Completed 400 requests >>> Completed 600 requests >>> Completed 800 requests >>> Completed 1000 requests >>> Completed 1200 requests >>> Completed 1400 requests >>> Completed 1600 requests >>> Completed 1800 requests >>> Completed 2000 requests >>> Finished 2000 requests >>> >>> >>> Server Software: Apache/2.4.23 >>> Server Hostname: localhost >>> Server Port: 80 >>> >>> Document Path: /apple-touch-icon.png >>> Document Length: 12364 bytes >>> >>> Concurrency Level: 100 >>> Time taken for tests: 2.576 seconds >>> Complete requests: 2000 >>> *Failed requests: 22* >>> (Connect: 0, Receive: 0, Length: 22, Exceptions: 0) >>> Keep-Alive requests: 1978 >>> Total transferred: 25304554 bytes >>> HTML transferred: 24455992 bytes >>> Requests per second: 776.52 [#/sec] (mean) >>> Time per request: 128.779 [ms] (mean) >>> Time per request: 1.288 [ms] (mean, across all concurrent requests) >>> Transfer rate: 9594.53 [Kbytes/sec] received >>> >>> Connection Times (ms) >>> min mean[+/-sd] median max >>> Connect: 0 1 3.9 0 20 >>> Processing: 0 125 112.0 101 563 >>> Waiting: 0 125 112.1 101 563 >>> Total: 0 126 112.6 102 563 >>> >>> Percentage of the requests served within a certain time (ms) >>> 50% 102 >>> 66% 159 >>> 75% 196 >>> 80% 220 >>> 90% 282 >>> 95% 354 >>> 98% 401 >>> 99% 457 >>> 100% 563 (longest request) >>> >>> *I get 22 failed requests, but if I run more again I obtain a different >>> number.* >>> >>> What do you thing? >>> >>> Thank you! >>> >>> >>> >>> 2016-09-30 13:50 GMT+02:00 Daniel <[email protected]>: >>> >>>> hello, >>>> >>>> the fact that all those 500 error reponses go with "X-Powered-By: >>>> PHP/5.6.26" tell me it is php giving those errors. >>>> >>>> If you are still unsure, you could use the ab tool from Apache HTTPD, >>>> and request a specific static file (text, gif) hundreds of times to see if >>>> you get a 500 error from HTTPD. >>>> >>>> 2016-09-30 8:59 GMT+02:00 Fabio F.Gervasi <[email protected]>: >>>> >>>>> Hi, >>>>> >>>>> I'm not 100% sure that the real source of the problem is apache (could >>>>> be php or other), but I'd like start from here as the only logs that >>>>> provides me with information (with trace4 level) is apache log. No >>>>> information appears from other logs. >>>>> >>>>> I have: >>>>> - Linux 4.7.4-100.fc23.x86_64 >>>>> - Server version: Apache/2.4.23 (Fedora) >>>>> - mysql Ver 15.1 Distrib 10.0.26-MariaDB, for Linux (x86_64) using >>>>> readline 5.1 >>>>> - PHP Version => 5.6.25 >>>>> >>>>> The problem: random internal server error 500. Random because, for >>>>> example, if a page produces the error and do "reload" then it works. The >>>>> same page may not work as well as work. Sometimes you can use the site for >>>>> several minutes with no problems, then suddenly there is the error. >>>>> >>>>> When error occurs, this is all that I get from the log file (with >>>>> loglevel = trace4): >>>>> >>>>> [...] >>>>> [Fri Sep 30 07:41:39.151052 2016] [rewrite:trace3] [pid 17957:tid >>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>> [perdir /home/web/centrometeo.com/] applying pattern '.*' to uri >>>>> 'index.php', referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.151092 2016] [rewrite:trace1] [pid 17957:tid >>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>> [perdir /home/web/centrometeo.com/] pass through /home/web/ >>>>> centrometeo.com/index.php, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913003 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(567): [client 151.25.206.200:38368] >>>>> Headers from script 'php.fcgi':, referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913097 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> *Status: >>>>> 500 Internal Server Error*, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913130 2016] [core:trace1] [pid 17957:tid >>>>> 140241771534080] util_script.c(649): [client 151.25.206.200:38368] *Status >>>>> line from script 'php.fcgi': 500 Internal Server Error*, referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913162 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913191 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM", referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913218 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913246 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> Vary: Accept-Encoding, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913275 2016] [core:trace4] [pid 17957:tid >>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>> Content-type: text/html; charset=UTF-8, referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913395 2016] [http:trace3] [pid 17957:tid >>>>> 140241771534080] http_filters.c(1006): [client 151.25.206.200:38368] >>>>> *Response >>>>> sent with status 500, headers:*, referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913432 2016] [http:trace4] [pid 17957:tid >>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913458 2016] [http:trace4] [pid 17957:tid >>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>> P3P: CP=\\"NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM\\", referer: >>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>> 559-wrf-mslp-6hprec-ita >>>>> [Fri Sep 30 07:41:39.913481 2016] [http:trace4] [pid 17957:tid >>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>> [...] >>>>> >>>>> Note the use of fcgi. Apache is indeed configured with >>>>> event/php-fpm/FastCGI, but the same thing happens with "normal" prefork / >>>>> php! >>>>> Until a few days ago it did not happen, it started after updating the >>>>> system (kernel, apache, php, mariadb). >>>>> I do not really know what to do. >>>>> >>>>> Thank you very much. >>>>> Fabio. >>>>> >>>> >>>> >>>> >>>> -- >>>> *Daniel Ferradal* >>>> IT Specialist >>>> >>>> email dferradal at gmail.com >>>> linkedin es.linkedin.com/in/danielferradal >>>> >>> >>> >> > > > -- > *Daniel Ferradal* > IT Specialist > > email dferradal at gmail.com > linkedin es.linkedin.com/in/danielferradal > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. >
