Hi Luca, Thanks for the details. 1. our server's ulimit values are: ]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63714 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Please let me know whether the values are sufficient to allow at least 500 concurrent connections. 2. Yes I checked mod_jk log when hang happens, and getting below errors continuously. [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 24.843284 [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_process_callback::jk_ajp_common.c (1788): Writing to client aborted or client network problems [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] ajp_service::jk_ajp_common.c (2447): (qu_prod_live_svr1) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1) [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1384): service failed, worker qu_prod_live_svr1 is in local error state [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] service::jk_lb_worker.c (1403): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance. [Wed Apr 19 02:00:38 2017]loadbalancer www.cmsp1.com 19.170901 [Wed Apr 19 02:00:38 2017][16313:3878614784] [info] jk_handler::mod_jk.c (2608): Aborting connection for worker=loadbalancer [Wed Apr 19 02:00:39 2017][16261:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with / [Wed Apr 19 02:00:40 2017][16308:3878614784] [warn] map_uri_to_worker_ext::jk_uri_worker_map.c (962): Uri * is invalid. Uri must start with / 3. We will upgrade to 2.4.25, could you please share optimal configuration for mpm-event to allow more concurrent users, please. Thanks Jay On Tue, Apr 18, 2017 at 10:03 AM, Luca Toscano <toscano.l...@gmail.com> wrote: > Hi, > > Some suggestions: > > 1) check your RHEL ulimits applied to httpd, the error message "Resource > temporarily unavailable: setuid: unable to change to uid" could be related > to maximum number of processes (allowed by the OS) reached. This should > allow you to spawn more httpd processes. > > 2) Have you checked when the "hang" happens? If you have long lived > connections and your httpd server reloads (for example for log rotation) > then it might hang a bit while waiting for the remaining connections to > drain. > > 3) If possible I'd consider to upgrade httpd to >= 2.4.25 and use > mpm-event (rather than prefork). > > Hope that helps! > > Luca > > > 2017-04-16 13:18 GMT+02:00 Jayaram Ponnusamy <jayaram.ponnus...@gmail.com> > : > >> Dear All, >> >> We were runnig our site in PHP based CMS tool earlier, and normally >> 20-30K users will access our sites daily. But in new system with Tomcat, we >> are facing performance and availability issue frequently, when i access the >> tomcat url directly the page is loading within 3seconds, but if we access >> webServer URL then its taking more than 9seconds. >> >> Also, Each day I am seeing more and more of these in my error_logs, and >> when the Total Children value is reached 999 the Apache is not responding >> and Server reboot only help to bring the site back. Every day atleast 4-5 >> times we are facing this issue (we are using mod_jk to connect with tomcat). >> >> Kindly please help on this. >> >> Usually I am seeing this on my error_log: >> [Sat Apr 15 20:49:33 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there >> are 4 idle, and 31 total children >> [Sat Apr 15 20:51:14 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 8 children, there >> are 0 idle, and 20 total children >> [Sat Apr 15 20:51:15 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 16 children, there >> are 0 idle, and 28 total children >> [Sat Apr 15 20:51:16 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 44 total children >> We are using two Apache Nodes and Connected with Two Tomcat (at >> Application Level Clustering). >> Apache Servers: >> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) >> Server version: Apache/2.2.21 (Unix) >> >> *httpd.conf* >> KeepAlive On >> Timeout 300 >> MaxKeepAliveRequests 100 >> KeepAliveTimeout 15 >> <IfModule prefork.c> >> StartServers 80 >> ServerLimit 3500 >> MaxClients 3500 >> MaxRequestsPerChild 0 >> </IfModule> >> >> *workers.properties* >> worker.list=loadbalancer,status >> worker.qu_prod_live_svr.type=ajp13 >> worker.qu_prod_live_svr.host=cmsp1 >> worker.qu_prod_live_svr.port=8009 >> worker.qu_prod_live_svr.socket_keepalive=1 >> worker.qu_prod_live_svr.socket_timeout=300 >> worker.qu_prod_live_svr1.type=ajp13 >> worker.qu_prod_live_svr1.host=cmsp2 >> worker.qu_prod_live_svr1.port=8009 >> worker.qu_prod_live_svr1.socket_keepalive=1 >> worker.qu_prod_live_svr1.socket_timeout=300 >> worker.qu_prod_live_svr.lbfactor=1 >> worker.qu_prod_live_svr1.lbfactor=1 >> worker.loadbalancer.type=lb >> worker.loadbalancer.balance_workers=qu_prod_live_svr,qu_prod_live_svr1 >> worker.status.type=status >> >> *Tomcat Servers:* >> 4 Core 64-bit, Rhel System running on 16GB RAM (Both Servers) >> Server version: Apache Tomcat/7.0.42 >> <Connector port="9090" protocol="HTTP/1.1" redirectPort="8443" >> URIEncoding="UTF-8" emptySessionPath="true" maxThreads="500" >> minSpareThreads="10" connectionTimeout="-1" /> >> <Connector port="8009" protocol="AJP/1.3" redirectPort="8443" >> URIEncoding="UTF-8" /> >> >> *error_log:* >> [Sat Apr 15 21:52:36 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 839 total children >> [Sat Apr 15 21:52:37 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 871 total children >> [Sat Apr 15 21:52:38 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 903 total children >> [Sat Apr 15 21:52:39 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 935 total children >> [Sat Apr 15 21:52:40 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 967 total children >> [Sat Apr 15 21:52:41 2017] [info] server seems busy, (you may need to >> increase StartServers, or Min/MaxSpareServers), spawning 32 children, there >> are 0 idle, and 999 total children >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] Child 9351 returned a Fatal error... >> Apache is exiting! >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:52:41 2017] [alert] (11)Resource temporarily unavailable: >> setuid: unable to change to uid: 2 >> [Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: >> apr_global_mutex_lock(jk_log_lock) failed >> [Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file >> [Sat Apr 15 21:53:06 2017][8752:4177577728] [info] >> ajp_connection_tcp_get_message::jk_ajp_common.c (1150): >> (qu_prod_live_svr1) can't receive the response header message from tomcat, >> network problems or tomcat (10.11.11.32:8009) is down (errno=104)\n >> failed: Broken pipe >> [Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: >> apr_global_mutex_unlock(jk_log_lock) failed >> [Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: >> apr_global_mutex_lock(jk_log_lock) failed >> [Sat Apr 15 21:53:06 2017] [error] mod_jk: jk_log_to_file [Sat Apr 15 >> 21:53:06 2017][8752:4177577728] [error] ajp_get_reply::jk_ajp_common.c >> (1962): (qu_prod_live_svr1) Tomcat is down or refused connection. No >> response has been sent to the client (yet)\n failed: Broken pipe >> [Sat Apr 15 21:53:06 2017] [error] (22)Invalid argument: >> apr_global_mutex_unlock(jk_log_lock) failed >> >> >> *Thanks & Regards,* >> *Jay* >> > >