Hello,
We are experiencing some weird problems with linux + tomcat + apache +
mod_jk + ssl.
We have a loadbalancing solution which is working for a couple of hours
(about 5) and
than it just stops responding. After playing with the workers properties a
bit
(adding socket_timeout, cache_timeout, recycle_timeout) the application
stops
for a bit and after few minutes starts working again.

I was thinking it could be because of some socket limitations which gets
exhausted
and causes the application to freeze for a while.

It's difficult to collect any usefull information as these things are
happening about once, maximaly twice a day.
There are about 10 users testing it.
tomcat is 5.5.12, apache 2.0.52 mod_jk-1.2.15

Before I set up the timeouts this is what was in the mod_jk.log, after that
the logs are empty.

[Mon Nov 28 10:29:53 2005] [error] ajp_service::jk_ajp_common.c (1758):
Error connecting to tomcat. Tomcat is probably not started or is listening
on the wrong port. worker=ajp132 failed
[Tue Nov 29 07:05:26 2005] [error] ajp_get_reply::jk_ajp_common.c (1503):
Tomcat is down or refused connection. No response has been sent to the
client (yet)
[Tue Nov 29 07:05:26 2005] [error] ajp_get_reply::jk_ajp_common.c (1503):
Tomcat is down or refused connection. No response has been sent to the
client (yet)
[Tue Nov 29 07:05:26 2005] [error] ajp_service::jk_ajp_common.c (1715):
receiving reply from tomcat failed without recovery in send loop 0
[Tue Nov 29 07:05:26 2005] [error] ajp_service::jk_ajp_common.c (1715):
receiving reply from tomcat failed without recovery in send loop 0
[Tue Nov 29 07:05:26 2005] [error] service::jk_lb_worker.c (687):
unrecoverable error 502, request failed. Tomcat failed in the middle of
request, we can't recover to another instance.

And here is my workers.properties
worker.list=loadbalancer
#, ajp131, ajp132

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=ajp131, ajp132
worker.loadbalancer.sticky_session=True

# Workers
worker.ajp131.port=8009
worker.ajp131.host=127.0.0.1
worker.ajp131.type=ajp13
worker.ajp131.lbfactor=1
#worker use up to 1 sockets, which will stay no more than 10mn in cache
worker.ajp131.cachesize=1
worker.ajp131.cache_timeout=600
worker.ajp131.socket_timeout=180
#worker ask operating system to send KEEP-ALIVE signal on the connection
worker.ajp131.socket_keepalive=1
#worker want ajp13 connection to be dropped after 5mn (recycle)
worker.ajp131.recycle_timeout=300

worker.ajp132.port=8009
worker.ajp132.host=10.241.154.124
worker.ajp132.type=ajp13
worker.ajp132.lbfactor=2
#worker use up to 1 sockets, which will stay no more than 10mn in cache
worker.ajp132.cachesize=1
worker.ajp132.cache_timeout=600
worker.ajp132.socket_timeout=180
#worker ask operating system to send KEEP-ALIVE signal on the connection
worker.ajp132.socket_keepalive=1
#worker want ajp13 connection to be dropped after 5mn (recycle)
worker.ajp132.recycle_timeout=300

Any ideas for solutions or pointers how to debug this whole damn stuff are
wery welcomed.
Thanks a lot.

Reply via email to