On 07.09.2010 12:40, Rainer Jung wrote:
On 07.09.2010 12:32, John Baker wrote:
Rainer,
On Tuesday 07 September 2010 11:09:46 you wrote:
I don't like "socket_timeout" ...
worker.basic.socket_timeout=90
but I would like socket_connect_timeout.
The next two are possibly a bit short, because if the backend e.g. does
a Java Garbage Collection which miht take longer than 1 second, tose
timeouts will fire and take the node out of load balancing.
worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000
I've change socket_timeout=90 to socket_connect_timeout=9000, and
connect_timeout/prepost_timeout to 5000, and the problem persists:
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration
OK, so it would be helpful to see a more complete log, e.g.
grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'
If there's data in it which is not meant for public posting, you can
also send it to me va direct mail.
Thanks for the log snippets. I can now see:
- the socket close *does* happen during maintain
- this wasn't obvious from the previous log snippert, because more
"Maintain" lines for other workers were left out
Some relevant lines are:
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-...
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint
with sd = 152 (socket shutdown)
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1
cache slots
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 seconds
from 1 pool slots
...
So now we can discuss, how to fix this.
- Can we detect more reliably, whether we want to drain before closing
or not
- As a workaround: using a JkWatchdog moves the maintain into a separate
thread. But during the socket closing a lock is held, which blocks other
threads from accessing the same worker.
We would need to rewrite the logic in ajp_maintain() to only hold the
lock (critical section) while pulling sockets out of the ep_cache or
entering them back, not while actually using the sockets.
I still don't know, *why* the socket close blocks, i.e. why the backend
doesn't respond to us wanting to close the socket. Here a network sniff
like indicated by Mladen would help.
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org