We will then have to look into what is happening in the step (probably add debugging code):
Warm-up cache process kicks off with 2 long running requests (45 seconds each). This is a PHP application running under mod_php - each process grows up to 700 MB, so the application kills the httpd child process at the end to release the memory, using posix_kill(PID, 28). Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In India : http://www.makeinindia.com/home On Fri, Oct 22, 2021 at 3:07 PM Patrick Verdon <patrick.ver...@youreko.com> wrote: > Correct. > > > On Fri, 22 Oct 2021 at 10:35, Deepak Goel <deic...@gmail.com> wrote: > >> I guess what you are saying is that the following error happens during >> startup and not during normal operation >> >> ( [Sun Oct 17 15:53:49.244527 2021] [mpm_prefork:error] [pid 3581] >> AH00161: server reached MaxRequestWorkers setting, consider raising the >> MaxRequestWorkers setting) >> >> >> Deepak >> "The greatness of a nation can be judged by the way its animals are >> treated - Mahatma Gandhi" >> >> +91 73500 12833 >> deic...@gmail.com >> >> Facebook: https://www.facebook.com/deicool >> LinkedIn: www.linkedin.com/in/deicool >> >> "Plant a Tree, Go Green" >> >> Make In India : http://www.makeinindia.com/home >> >> >> On Fri, Oct 22, 2021 at 2:23 PM Patrick Verdon < >> patrick.ver...@youreko.com> wrote: >> >>> Hi Yann, >>> >>> Quick update - we've enabled the core dumps but haven't been able to >>> reproduce the issue. After removing mod_http2 the first time we were able >>> to trigger the crash after 14 attempts but we've since tried over 100 times >>> with no luck. We'll keep trying as there's nothing worse than knowing >>> there's a bug lurking that can cause a crash. >>> >>> @Deepak - thanks for your suggestion but hitting MaxRequestWorkers is a >>> quirk of our installation, we load the max workers on startup so that the >>> PHP application is primed and ready, rather than have Apache spawn lots of >>> heavy processes. This is the same configuration we've had for years without >>> ever experiencing Apache hanging until the upgrade to 2.4.48. >>> >>> Thanks. >>> >>> Patrick >>> >>> *--* >>> >>> *Patrick Verdon | Founder* >>> Web: www.youreko.com >>> Mobile: +44 (0)7809 296438 >>> Skype: patrick_verdon >>> >>> This entire communication is sent on behalf of >>> Youreko Ltd and is strictly confidential to and >>> for the sole use of the intended addressee. >>> >>> Registered in England - 7448349 >>> >>> >>> >>> On Tue, 19 Oct 2021 at 11:00, Deepak Goel <deic...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> Looks like the step 2 in your process is not working in the upgraded >>>> version of apache. >>>> >>>> Therefore it is vomiting out the following: >>>> server reached MaxRequestWorkers setting, consider raising the >>>> MaxRequestWorkers setting >>>> >>>> Deepak >>>> "The greatness of a nation can be judged by the way its animals are >>>> treated - Mahatma Gandhi" >>>> >>>> +91 73500 12833 >>>> deic...@gmail.com >>>> >>>> Facebook: https://www.facebook.com/deicool >>>> LinkedIn: www.linkedin.com/in/deicool >>>> >>>> "Plant a Tree, Go Green" >>>> >>>> Make In India : http://www.makeinindia.com/home >>>> >>>> >>>> On Mon, Oct 18, 2021 at 2:57 PM Patrick Verdon < >>>> patrick.ver...@youreko.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I'd appreciate some feedback on an issue I'm experiencing. I've spent >>>>> quite some time researching the problem as it causes a serious outage in >>>>> our application. I've searched the Web, Stack Overflow, this list's mail >>>>> archives, the latest Apache bugs, and more, but have not been able to find >>>>> any reports of a similar issue. >>>>> >>>>> Background. I'm running the latest Apache 2.4.51 on Amazon Linux with >>>>> mod_proxy, mod_php and mod_ssl with varnish in front. Some requests to our >>>>> application take about 45 seconds to complete so there is a warm-up cache >>>>> procedure at regular intervals during the day which primes the varnish >>>>> cache. The following steps reliably cause Apache to hang, requiring a >>>>> manual restart: >>>>> >>>>> 1. Varnish cache is cleared, causing spike in load on httpd >>>>> 2. Warm-up cache process kicks off with 2 long running requests >>>>> (45 seconds each). This is a PHP application running under mod_php - >>>>> each >>>>> process grows up to 700 MB, so the application kills the httpd child >>>>> process at the end to release the memory, using posix_kill(PID, 28). >>>>> 3. Apache hangs and does not recover. Varnish serves 503s. >>>>> 4. Manual restart required: service httpd restart >>>>> 5. Errors in the log show that 2 children had segmentation faults, >>>>> presumably the 2 with long running processes. >>>>> >>>>> >>>>> Albeit ugly, this process has been running for a year and a half >>>>> without any issues. We traced the date that crashes started to the date >>>>> Apache was upgraded from version 2.4.46 to 2.4.48 and as you can see it's >>>>> still an issue in 2.4.51. >>>>> >>>>> See the error_log below and details about the installation. >>>>> >>>>> Any feedback on where to report this issue would be much appreciated. >>>>> >>>>> Thanks. >>>>> >>>>> Patrick >>>>> >>>>> -- >>>>> >>>>> # cat /var/log/httpd/error_log >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> *** Error in `/usr/sbin/httpd': corrupted size vs. prev_size: >>>>> 0x0000557f94567e4f *** >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> httpd: misc/apr_reslist.c:161: reslist_cleanup: Assertion `rl->ntotal >>>>> == 0' failed. >>>>> [Sun Oct 17 15:53:47.990497 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3166 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990531 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3483 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990545 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2657 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990557 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2660 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990568 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2661 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990579 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3172 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990592 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2681 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990603 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3254 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990615 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2685 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990627 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2688 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990639 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3015 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990652 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2696 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990664 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2699 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990680 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2710 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990692 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2713 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990703 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3250 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990716 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2721 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990726 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2724 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990739 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2734 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990750 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3471 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990769 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3109 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:47.990781 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2741 exit signal Segmentation fault (11) >>>>> *** Error in `/usr/sbin/httpd': corrupted size vs. prev_size: >>>>> 0x0000557f94567e4f *** >>>>> [Sun Oct 17 15:53:48.056539 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 3019 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:48.056584 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2707 exit signal Segmentation fault (11) >>>>> [Sun Oct 17 15:53:48.056599 2021] [core:notice] [pid 2620] AH00052: >>>>> child pid 2727 exit signal Aborted (6) >>>>> [Sun Oct 17 15:53:48.056667 2021] [mpm_prefork:notice] [pid 2620] >>>>> AH00169: caught SIGTERM, shutting down >>>>> [Sun Oct 17 15:53:48.151770 2021] [suexec:notice] [pid 3575] AH01232: >>>>> suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) >>>>> [Sun Oct 17 15:53:48.180621 2021] [http2:warn] [pid 3581] AH10034: The >>>>> mpm module (prefork.c) is not supported by mod_http2. The mpm determines >>>>> how things are processed in your server. HTTP/2 has more demands in this >>>>> regard and the currently selected mpm will just not do. This is an >>>>> advisory >>>>> warning. Your server will continue to work, but the HTTP/2 protocol will >>>>> be >>>>> inactive. >>>>> [Sun Oct 17 15:53:48.181146 2021] [lbmethod_heartbeat:notice] [pid >>>>> 3581] AH02282: No slotmem from mod_heartmonitor >>>>> [Sun Oct 17 15:53:48.243891 2021] [mpm_prefork:notice] [pid 3581] >>>>> AH00163: Apache/2.4.51 (Amazon) OpenSSL/1.0.2k-fips configured -- resuming >>>>> normal operations >>>>> [Sun Oct 17 15:53:48.243923 2021] [core:notice] [pid 3581] AH00094: >>>>> Command line: '/usr/sbin/httpd' >>>>> [Sun Oct 17 15:53:49.244527 2021] [mpm_prefork:error] [pid 3581] >>>>> AH00161: server reached MaxRequestWorkers setting, consider raising the >>>>> MaxRequestWorkers setting >>>>> >>>>> # httpd -V >>>>> Server version: Apache/2.4.51 (Amazon) >>>>> Server built: Oct 8 2021 19:30:47 >>>>> Server's Module Magic Number: 20120211:118 >>>>> Server loaded: APR 1.6.3, APR-UTIL 1.5.4 >>>>> Compiled using: APR 1.6.3, APR-UTIL 1.5.4 >>>>> Architecture: 64-bit >>>>> Server MPM: prefork >>>>> threaded: no >>>>> forked: yes (variable process count) >>>>> Server compiled with.... >>>>> -D APR_HAS_SENDFILE >>>>> -D APR_HAS_MMAP >>>>> -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) >>>>> -D APR_USE_SYSVSEM_SERIALIZE >>>>> -D APR_USE_PTHREAD_SERIALIZE >>>>> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT >>>>> -D APR_HAS_OTHER_CHILD >>>>> -D AP_HAVE_RELIABLE_PIPED_LOGS >>>>> -D DYNAMIC_MODULE_LIMIT=256 >>>>> -D HTTPD_ROOT="/etc/httpd" >>>>> -D SUEXEC_BIN="/usr/sbin/suexec" >>>>> -D DEFAULT_PIDLOG="/var/run/httpd/httpd.pid" >>>>> -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" >>>>> -D DEFAULT_ERRORLOG="logs/error_log" >>>>> -D AP_TYPES_CONFIG_FILE="conf/mime.types" >>>>> -D SERVER_CONFIG_FILE="conf/httpd.conf" >>>>> >>>>> # httpd -M >>>>> Loaded Modules: >>>>> core_module (static) >>>>> so_module (static) >>>>> http_module (static) >>>>> access_compat_module (shared) >>>>> actions_module (shared) >>>>> alias_module (shared) >>>>> allowmethods_module (shared) >>>>> auth_basic_module (shared) >>>>> auth_digest_module (shared) >>>>> authn_anon_module (shared) >>>>> authn_core_module (shared) >>>>> authn_dbd_module (shared) >>>>> authn_dbm_module (shared) >>>>> authn_file_module (shared) >>>>> authn_socache_module (shared) >>>>> authz_core_module (shared) >>>>> authz_dbd_module (shared) >>>>> authz_dbm_module (shared) >>>>> authz_groupfile_module (shared) >>>>> authz_host_module (shared) >>>>> authz_owner_module (shared) >>>>> authz_user_module (shared) >>>>> autoindex_module (shared) >>>>> cache_module (shared) >>>>> cache_disk_module (shared) >>>>> cache_socache_module (shared) >>>>> data_module (shared) >>>>> dbd_module (shared) >>>>> deflate_module (shared) >>>>> dir_module (shared) >>>>> dumpio_module (shared) >>>>> echo_module (shared) >>>>> env_module (shared) >>>>> expires_module (shared) >>>>> ext_filter_module (shared) >>>>> filter_module (shared) >>>>> headers_module (shared) >>>>> http2_module (shared) >>>>> include_module (shared) >>>>> info_module (shared) >>>>> log_config_module (shared) >>>>> logio_module (shared) >>>>> macro_module (shared) >>>>> mime_magic_module (shared) >>>>> mime_module (shared) >>>>> negotiation_module (shared) >>>>> remoteip_module (shared) >>>>> reqtimeout_module (shared) >>>>> request_module (shared) >>>>> rewrite_module (shared) >>>>> setenvif_module (shared) >>>>> slotmem_plain_module (shared) >>>>> slotmem_shm_module (shared) >>>>> socache_dbm_module (shared) >>>>> socache_memcache_module (shared) >>>>> socache_shmcb_module (shared) >>>>> status_module (shared) >>>>> substitute_module (shared) >>>>> suexec_module (shared) >>>>> unixd_module (shared) >>>>> userdir_module (shared) >>>>> version_module (shared) >>>>> vhost_alias_module (shared) >>>>> watchdog_module (shared) >>>>> dav_module (shared) >>>>> dav_fs_module (shared) >>>>> dav_lock_module (shared) >>>>> lua_module (shared) >>>>> mpm_prefork_module (shared) >>>>> proxy_module (shared) >>>>> lbmethod_bybusyness_module (shared) >>>>> lbmethod_byrequests_module (shared) >>>>> lbmethod_bytraffic_module (shared) >>>>> lbmethod_heartbeat_module (shared) >>>>> proxy_ajp_module (shared) >>>>> proxy_balancer_module (shared) >>>>> proxy_connect_module (shared) >>>>> proxy_express_module (shared) >>>>> proxy_fcgi_module (shared) >>>>> proxy_fdpass_module (shared) >>>>> proxy_ftp_module (shared) >>>>> proxy_http_module (shared) >>>>> proxy_hcheck_module (shared) >>>>> proxy_scgi_module (shared) >>>>> proxy_uwsgi_module (shared) >>>>> proxy_wstunnel_module (shared) >>>>> ssl_module (shared) >>>>> cgi_module (shared) >>>>> php7_module (shared) >>>>> >>>>> # yum list | grep mod_ >>>>> lighttpd-mod_authn_gssapi.x86_64 1.4.53-1.36.amzn1 >>>>> amzn-updates >>>>> lighttpd-mod_authn_mysql.x86_64 1.4.53-1.36.amzn1 >>>>> amzn-updates >>>>> lighttpd-mod_authn_pam.x86_64 1.4.53-1.36.amzn1 >>>>> amzn-updates >>>>> lighttpd-mod_geoip.x86_64 1.4.53-1.36.amzn1 >>>>> amzn-updates >>>>> lighttpd-mod_mysql_vhost.x86_64 1.4.53-1.36.amzn1 >>>>> amzn-updates >>>>> mod_auth_kerb.x86_64 5.4-10.9.amzn1 >>>>> amzn-main >>>>> mod_auth_mellon.x86_64 0.13.1-1.6.amzn1 >>>>> amzn-updates >>>>> mod_auth_mysql.x86_64 1:3.0.0-18.10.amzn1 >>>>> amzn-main >>>>> mod_auth_pgsql.x86_64 2.0.3-10.1.5.amzn1 >>>>> amzn-main >>>>> mod_authz_ldap.x86_64 0.26-16.8.amzn1 >>>>> amzn-main >>>>> mod_dav_svn.x86_64 1.9.7-1.54.amzn1 >>>>> amzn-main >>>>> mod_fcgid.x86_64 2.3.9-1.6.amzn1 >>>>> amzn-main >>>>> mod_geoip.x86_64 1.2.7-1.2.amzn1 >>>>> amzn-main >>>>> mod_nss.x86_64 1.0.10-1.13.amzn1 >>>>> amzn-main >>>>> mod_perl.x86_64 2.0.7-7.28.amzn1 >>>>> amzn-updates >>>>> mod_perl-devel.x86_64 2.0.7-7.28.amzn1 >>>>> amzn-updates >>>>> mod_proxy_html.x86_64 3.1.2-7.3.amzn1 >>>>> amzn-main >>>>> mod_python26.x86_64 3.3.1-17.20.amzn1 >>>>> amzn-main >>>>> mod_python27.x86_64 3.3.1-17.20.amzn1 >>>>> amzn-main >>>>> mod_security.x86_64 2.8.0-5.27.amzn1 >>>>> amzn-main >>>>> mod_security_crs.noarch 2.2.8-2.5.amzn1 >>>>> amzn-main >>>>> mod_security_crs-extras.noarch 2.2.8-2.5.amzn1 >>>>> amzn-main >>>>> mod_ssl.x86_64 1:2.2.34-1.16.amzn1 >>>>> amzn-main >>>>> mod_wsgi-python26.x86_64 3.2-6.12.amzn1 >>>>> amzn-updates >>>>> mod_wsgi-python27.x86_64 3.2-6.12.amzn1 >>>>> amzn-updates >>>>> >>>>> *--* >>>>> >>>>> *Patrick Verdon | Founder* >>>>> Web: www.youreko.com >>>>> Mobile: +44 (0)7809 296438 >>>>> Skype: patrick_verdon >>>>> >>>>> This entire communication is sent on behalf of >>>>> Youreko Ltd and is strictly confidential to and >>>>> for the sole use of the intended addressee. >>>>> >>>>> Registered in England - 7448349 >>>>> >>>>> >>>>