I really see nothing in that strace output that tells anything is crashing.
2016-10-02 10:44 GMT+02:00 Fabio F.Gervasi <[email protected]>: > Hi James, > > I do not have the skills to be able to work well in "depth" to the problem > as you (rightly) suggest. What I know is that this problem started from the > last system update (apache / php / mariadb, ...) from 23 September. > > The only log info is: /var/log/httpd/error_log, at "crash time": [Fri Sep > 30 19:09:03.897325 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] > event.c(930): socket reached timeout in lingering-close state > > and I don't know if strace (the results of which I have just quoted) is > useful.. > > Should I wait for further apache / php updates? > > > > *Fabio F. Gervasi* > Engineering Consultant > > > *Web www.centrometeo.com <http://www.centrometeo.com/>* > *g+ plus.google.com/centrometeo > <https://plus.google.com/105294368204645075621>* > > *Twitter twitter.com/centrometeo <http://twitter.com/centrometeo>* > *Facebook facebook.com/centrometeo > <http://www.facebook.com/centrometeo>* > > > > > > > > > > 2016-10-02 10:34 GMT+02:00 Dr James Smith <[email protected]>: > >> Things you can do are trying to track down if there is any pattern - are >> these first request on a child, "nth" request on child etc... we add >> additional variables to access logs which include things such as >> PID/request no in PID, memory usage before and after etc. This allows us >> sometimes to see patterns in errors which aren't normally easy to see.. >> Saying that it may not help you here... >> >> Another thing you can do with ab is check to see if it is OK under lower >> load? >> >> Finally I find "siege" better than "ab" when it comes to testing as it is >> more configurable - you can send it a list of URLs - run test for >> "n-seconds" rather than "n-requests" which is more useful if you don't know >> how long things take to return! >> >> James >> >> >> On 02/10/2016 09:02, Daniel wrote: >> >> I would bet on your method of parsing php and the scripts being ran, in >> any case to be sure: >> >> Try the strace approach: >> strace -o /tmp/outputfile -s 5000 httpd -X or strace -ff -F -s200 -o >> /tmp/strace.out -p PID >> >> to try to find out what's really going on. >> >> 2016-09-30 20:19 GMT+02:00 Fabio F.Gervasi <[email protected]>: >> >>> Hi! >>> >>> I have additional information: >>> >>> /var/log/httpd/error_log, at "crash time": [Fri Sep 30 19:09:03.897325 >>> 2016] [mpm_event:trace4] [pid 30339:tid 139796798162688] event.c(930): >>> socket reached timeout in lingering-close state >>> >>> What do you think? >>> >>> >>> >>> 2016-09-30 14:47 GMT+02:00 Fabio F.Gervasi <[email protected]>: >>> >>>> Hi! >>>> >>>> Thank you for your reply. I tried the following tests. >>>> >>>> *1) Using a little text file:* >>>> *# ab -k -c 100 -n 2000 localhost/test.txt* >>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>> http://www.zeustech.net/ >>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>> >>>> Benchmarking localhost (be patient) >>>> Completed 200 requests >>>> Completed 400 requests >>>> Completed 600 requests >>>> Completed 800 requests >>>> Completed 1000 requests >>>> Completed 1200 requests >>>> Completed 1400 requests >>>> Completed 1600 requests >>>> Completed 1800 requests >>>> Completed 2000 requests >>>> Finished 2000 requests >>>> >>>> >>>> Server Software: Apache/2.4.23 >>>> Server Hostname: localhost >>>> Server Port: 80 >>>> >>>> Document Path: /test.txt >>>> Document Length: 52 bytes >>>> >>>> Concurrency Level: 100 >>>> Time taken for tests: 2.521 seconds >>>> Complete requests: 2000 >>>> *Failed requests: 16* >>>> (Connect: 0, Receive: 0, Length: 16, Exceptions: 0) >>>> Keep-Alive requests: 1984 >>>> Total transferred: 950336 bytes >>>> HTML transferred: 103168 bytes >>>> Requests per second: 793.45 [#/sec] (mean) >>>> Time per request: 126.032 [ms] (mean) >>>> Time per request: 1.260 [ms] (mean, across all concurrent >>>> requests) >>>> Transfer rate: 368.18 [Kbytes/sec] received >>>> >>>> Connection Times (ms) >>>> min mean[+/-sd] median max >>>> Connect: 0 1 3.8 0 21 >>>> Processing: 1 123 126.5 83 772 >>>> Waiting: 0 123 126.7 83 772 >>>> Total: 1 124 126.6 85 772 >>>> >>>> Percentage of the requests served within a certain time (ms) >>>> 50% 85 >>>> 66% 154 >>>> 75% 201 >>>> 80% 229 >>>> 90% 303 >>>> 95% 373 >>>> 98% 452 >>>> 99% 547 >>>> 100% 772 (longest request) >>>> >>>> *I obtain 16 failed requests, but if I run more again I obtain a >>>> different number.* >>>> >>>> *2) Using a big gif file:* >>>> *# ab -k -c 100 -n 2000 localhost/it-vis-animation.gif* >>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>> http://www.zeustech.net/ >>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>> >>>> Benchmarking localhost (be patient) >>>> Completed 200 requests >>>> Completed 400 requests >>>> Completed 600 requests >>>> Completed 800 requests >>>> Completed 1000 requests >>>> Completed 1200 requests >>>> Completed 1400 requests >>>> Completed 1600 requests >>>> Completed 1800 requests >>>> Completed 2000 requests >>>> Finished 2000 requests >>>> >>>> >>>> Server Software: Apache/2.4.23 >>>> Server Hostname: localhost >>>> Server Port: 80 >>>> >>>> Document Path: /it-vis-animation.gif >>>> Document Length: 8105309 bytes >>>> >>>> Concurrency Level: 100 >>>> Time taken for tests: 26.294 seconds >>>> Complete requests: 2000 >>>> *Failed requests: 0* >>>> Keep-Alive requests: 2000 >>>> Total transferred: 16211484000 bytes >>>> HTML transferred: 16210618000 bytes >>>> Requests per second: 76.06 [#/sec] (mean) >>>> Time per request: 1314.691 [ms] (mean) >>>> Time per request: 13.147 [ms] (mean, across all concurrent >>>> requests) >>>> Transfer rate: 602100.83 [Kbytes/sec] received >>>> >>>> Connection Times (ms) >>>> min mean[+/-sd] median max >>>> Connect: 0 1 4.0 0 22 >>>> Processing: 404 1309 80.0 1298 1635 >>>> Waiting: 5 22 32.4 10 207 >>>> Total: 420 1310 80.6 1298 1635 >>>> >>>> Percentage of the requests served within a certain time (ms) >>>> 50% 1298 >>>> 66% 1315 >>>> 75% 1342 >>>> 80% 1360 >>>> 90% 1397 >>>> 95% 1440 >>>> 98% 1513 >>>> 99% 1530 >>>> 100% 1635 (longest request) >>>> >>>> *In this case I obtain always 0 failed!* >>>> >>>> *3) Using a little png file:* >>>> *# ab -k -c 100 -n 2000 localhost/apple-touch-icon.png* >>>> This is ApacheBench, Version 2.3 <$Revision: 1748469 $> >>>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, >>>> http://www.zeustech.net/ >>>> Licensed to The Apache Software Foundation, http://www.apache.org/ >>>> >>>> Benchmarking localhost (be patient) >>>> Completed 200 requests >>>> Completed 400 requests >>>> Completed 600 requests >>>> Completed 800 requests >>>> Completed 1000 requests >>>> Completed 1200 requests >>>> Completed 1400 requests >>>> Completed 1600 requests >>>> Completed 1800 requests >>>> Completed 2000 requests >>>> Finished 2000 requests >>>> >>>> >>>> Server Software: Apache/2.4.23 >>>> Server Hostname: localhost >>>> Server Port: 80 >>>> >>>> Document Path: /apple-touch-icon.png >>>> Document Length: 12364 bytes >>>> >>>> Concurrency Level: 100 >>>> Time taken for tests: 2.576 seconds >>>> Complete requests: 2000 >>>> *Failed requests: 22* >>>> (Connect: 0, Receive: 0, Length: 22, Exceptions: 0) >>>> Keep-Alive requests: 1978 >>>> Total transferred: 25304554 bytes >>>> HTML transferred: 24455992 bytes >>>> Requests per second: 776.52 [#/sec] (mean) >>>> Time per request: 128.779 [ms] (mean) >>>> Time per request: 1.288 [ms] (mean, across all concurrent >>>> requests) >>>> Transfer rate: 9594.53 [Kbytes/sec] received >>>> >>>> Connection Times (ms) >>>> min mean[+/-sd] median max >>>> Connect: 0 1 3.9 0 20 >>>> Processing: 0 125 112.0 101 563 >>>> Waiting: 0 125 112.1 101 563 >>>> Total: 0 126 112.6 102 563 >>>> >>>> Percentage of the requests served within a certain time (ms) >>>> 50% 102 >>>> 66% 159 >>>> 75% 196 >>>> 80% 220 >>>> 90% 282 >>>> 95% 354 >>>> 98% 401 >>>> 99% 457 >>>> 100% 563 (longest request) >>>> >>>> *I get 22 failed requests, but if I run more again I obtain a different >>>> number.* >>>> >>>> What do you thing? >>>> >>>> Thank you! >>>> >>>> >>>> >>>> 2016-09-30 13:50 GMT+02:00 Daniel <[email protected]>: >>>> >>>>> hello, >>>>> >>>>> the fact that all those 500 error reponses go with "X-Powered-By: >>>>> PHP/5.6.26" tell me it is php giving those errors. >>>>> >>>>> If you are still unsure, you could use the ab tool from Apache HTTPD, >>>>> and request a specific static file (text, gif) hundreds of times to see if >>>>> you get a 500 error from HTTPD. >>>>> >>>>> 2016-09-30 8:59 GMT+02:00 Fabio F.Gervasi <[email protected]>: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm not 100% sure that the real source of the problem is apache >>>>>> (could be php or other), but I'd like start from here as the only logs >>>>>> that >>>>>> provides me with information (with trace4 level) is apache log. No >>>>>> information appears from other logs. >>>>>> >>>>>> I have: >>>>>> - Linux 4.7.4-100.fc23.x86_64 >>>>>> - Server version: Apache/2.4.23 (Fedora) >>>>>> - mysql Ver 15.1 Distrib 10.0.26-MariaDB, for Linux (x86_64) using >>>>>> readline 5.1 >>>>>> - PHP Version => 5.6.25 >>>>>> >>>>>> The problem: random internal server error 500. Random because, for >>>>>> example, if a page produces the error and do "reload" then it works. The >>>>>> same page may not work as well as work. Sometimes you can use the site >>>>>> for >>>>>> several minutes with no problems, then suddenly there is the error. >>>>>> >>>>>> When error occurs, this is all that I get from the log file (with >>>>>> loglevel = trace4): >>>>>> >>>>>> [...] >>>>>> [Fri Sep 30 07:41:39.151052 2016] [rewrite:trace3] [pid 17957:tid >>>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>>> [perdir /home/web/centrometeo.com/] applying pattern '.*' to uri >>>>>> 'index.php', referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.151092 2016] [rewrite:trace1] [pid 17957:tid >>>>>> 140241771534080] mod_rewrite.c(477): [client 151.25.206.200:38368] >>>>>> 151.25.206.200 - - [www.centrometeo.com/sid#55c3e >>>>>> 3673400][rid#7f8c8000e9d0/subreq >>>>>> <http://www.centrometeo.com/sid#55c3e3673400][rid%237f8c8000e9d0/subreq>] >>>>>> [perdir /home/web/centrometeo.com/] pass through /home/web/ >>>>>> centrometeo.com/index.php, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913003 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(567): [client 151.25.206.200:38368] >>>>>> Headers from script 'php.fcgi':, referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913097 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> *Status: >>>>>> 500 Internal Server Error*, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913130 2016] [core:trace1] [pid 17957:tid >>>>>> 140241771534080] util_script.c(649): [client 151.25.206.200:38368] >>>>>> *Status >>>>>> line from script 'php.fcgi': 500 Internal Server Error*, referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913162 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913191 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM", referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913218 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913246 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> Vary: Accept-Encoding, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913275 2016] [core:trace4] [pid 17957:tid >>>>>> 140241771534080] util_script.c(568): [client 151.25.206.200:38368] >>>>>> Content-type: text/html; charset=UTF-8, referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913395 2016] [http:trace3] [pid 17957:tid >>>>>> 140241771534080] http_filters.c(1006): [client 151.25.206.200:38368] >>>>>> *Response >>>>>> sent with status 500, headers:*, referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913432 2016] [http:trace4] [pid 17957:tid >>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>> X-Powered-By: PHP/5.6.26, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913458 2016] [http:trace4] [pid 17957:tid >>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>> P3P: CP=\\"NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM\\", referer: >>>>>> http://www.centrometeo.com/modelli-numerici/modello-wrf-nmm/ >>>>>> 559-wrf-mslp-6hprec-ita >>>>>> [Fri Sep 30 07:41:39.913481 2016] [http:trace4] [pid 17957:tid >>>>>> 140241771534080] http_filters.c(835): [client 151.25.206.200:38368] >>>>>> Content-Encoding: gzip, referer: http://www.centrometeo.com/mod >>>>>> elli-numerici/modello-wrf-nmm/559-wrf-mslp-6hprec-ita >>>>>> [...] >>>>>> >>>>>> Note the use of fcgi. Apache is indeed configured with >>>>>> event/php-fpm/FastCGI, but the same thing happens with "normal" prefork / >>>>>> php! >>>>>> Until a few days ago it did not happen, it started after updating the >>>>>> system (kernel, apache, php, mariadb). >>>>>> I do not really know what to do. >>>>>> >>>>>> Thank you very much. >>>>>> Fabio. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Daniel Ferradal* >>>>> IT Specialist >>>>> >>>>> email dferradal at gmail.com >>>>> linkedin es.linkedin.com/in/danielferradal >>>>> >>>> >>>> >>> >> >> >> -- >> *Daniel Ferradal* >> IT Specialist >> >> email dferradal at gmail.com >> linkedin es.linkedin.com/in/danielferradal >> >> >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a company >> registered in England with number 2742969, whose registered office is 215 >> Euston Road, London, NW1 2BE. >> > > -- *Daniel Ferradal* IT Specialist email dferradal at gmail.com linkedin es.linkedin.com/in/danielferradal
