I could try not using APR, and I'd be fine with that if I knew the performance impact of it. I could load test it in dev to be sure with minimal trouble. Tomcat does all dynamic stuff for me. A few servlets, some fast, some slow in their processing, jdbc database access, no static content, no session clustering yet.
I was surprised at the ping timeout too, because I don't consider my tomcats to be dogging performance wise or starved for resources. The specific messages are: [Fri Feb 20 08:57:15 2009] [5723:1106225472] [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1104): (tr2) can't receive the response message from tomcat, tomcat (192.168.x.x:8009) has forced a connection close for socket 58 [Fri Feb 20 08:57:15 2009] [5723:1106225472] [info] ajp_handle_cping_cpong::jk_ajp_common.c (876): awaited reply cpong, not received [Fri Feb 20 08:57:15 2009] [5723:1106225472] [info] ajp_maintain::jk_ajp_common.c (3046): (tr2) failed sending request, socket -1 keepalive cping/cpong failure (errno=0) It's possible tomcat is closing the connection between the ping and the pong. I rarely get workers degraded though. JKstatus shows all good. No degraded/bad/stopped. Individual worker stats there show nothing in the error "Er" column, a few for "CE" which is to be expected, and no "Re"s. This is over a period of about 50K accesses per worker. (I do about 60 reqs/s at peak at each tomcat over 4 apache and 4 tomcat servers, so I'm doing about 15/sec through jk to each worker.). My threads-busy in tomcat runs at 1-2, occasionally 3 at peak. Looking at my tomcat mem free graphs, looks like max memory is reclaimed every 60-90mins at peak, 180 mins off-peak, 480 mins off-hours. This would be the times of full GCs? I just turned verbose GC logging on so don't have a whole lot of stats yet. Inc. GCs look like 5-20secs apart. A sampling: 442.614: [GC [PSYoungGen: 152990K->5858K(162432K)] 169094K->21970K(512000K), 0.0076170 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 454.837: [GC [PSYoungGen: 156834K->253K(163392K)] 172946K->16437K(512960K), 0.0022680 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 468.128: [GC [PSYoungGen: 151933K->9463K(161152K)] 168117K->25711K(510720K), 0.0121500 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 480.744: [GC [PSYoungGen: 161143K->1700K(162432K)] 177391K->17996K(512000K), 0.0027160 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 488.831: [GC [PSYoungGen: 151140K->386K(161728K)] 167436K->16745K(511296K), 0.0021470 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 500.843: [GC [PSYoungGen: 149826K->447K(163264K)] 166185K->16854K(512832K), 0.0013350 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] I appreciate your help. -Tony --------------------------- Manager, IT Operations Format Dynamics, Inc. 303-573-1800x27 abia...@formatdynamics.com http://www.formatdynamics.com -----Original Message----- From: Rainer Jung [mailto:rainer.j...@kippdata.de] Sent: Friday, February 20, 2009 3:03 AM To: Tomcat Users List Subject: Re: mod_jk pool/thread/configure questions On 19.02.2009 19:17, Anthony J. Biacco wrote: >>> the max of 400 and stay there until tomcat is restarted. Is there a way >>> to resolve this? And more importantly, should I resolve it? Is there any >>> major memory/CPU inplications to it keeping its threads at the max? >> Do a thread dump "kill -QUIT". It goes to catalina.out and will tell >> you, what all those 400 threads are doing. Maybe they are stuck working >> on old requests nobody is waiting for. > > All the idle threads look like this: > > "ajp-8009-63" daemon prio=10 tid=0x000000001b52f000 nid=0x52ec in > Object.wait() [0x000000004610c000..0x000000004610cd90] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on<0x00002b3aebf5f840> (a > org.apache.tomcat.util.net.AprEndpoint$Worker) > at java.lang.Object.wait(Object.java:485) > at > org.apache.tomcat.util.net.AprEndpoint$Worker.await(AprEndpoint.java:146 > 5) > - locked<0x00002b3aebf5f840> (a > org.apache.tomcat.util.net.AprEndpoint$Worker) > at > org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1490) > at java.lang.Thread.run(Thread.java:619) We can see, that you are using the APR connector implementation. That one in 6.0 doesn't shrink the threads. All the above threads are sitting idle in the thread pool. If you really need the pool to shrik, you could try the traditional one (i.e. without APR). >>> worker.template.reply_timeout=20000 > >> When using such an ambitious reply_timeout, also use > max_reply_timeouts. > > I'm under the understanding this is the timeout between packet responses > from tomcat. I don't think they should be any longer than this. Often once there is a performance issue, response times go up a lot. When a reply timeout is detected, as a result the worker is put into error mode and the JK load balancer will send all request to some other node. Users will loose their sessions as a consequence. You don't want that to happen only because of a very short lived problem. So either increase the timeout, or use max_reply_timeouts, which will tolerate a couple of timeouts before putting the worker into error state. >> worker.template.socket_connect_timeout=5000 >> worker.template.ping_mode=A >> worker.template.ping_timeout=25000 > > Yeah, I tried 5 and 10, but jk was reporting it not getting cpongs back > from tomcat. That's strange! Cping/Cpong should be very fast. If you run into a Cping/Cpong timeout with 10 seconds (=10000), then there's something wrong either with the network, or your Tomcat is blocked by too many parallel requests (or an OutOfMemoryError, or ...). Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org