On 07.09.2010 12:40, Rainer Jung wrote:
On 07.09.2010 12:32, John Baker wrote:
Rainer,

On Tuesday 07 September 2010 11:09:46 you wrote:
I don't like "socket_timeout" ...

worker.basic.socket_timeout=90

but I would like socket_connect_timeout.

The next two are possibly a bit short, because if the backend e.g. does
a Java Garbage Collection which miht take longer than 1 second, tose
timeouts will fire and take the node out of load balancing.

worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000

I've change socket_timeout=90 to socket_connect_timeout=9000, and
connect_timeout/prepost_timeout to 5000, and the problem persists:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration

OK, so it would be helpful to see a more complete log, e.g.

grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'

If there's data in it which is not meant for public posting, you can
also send it to me va direct mail.

Thanks for the log snippets. I can now see:

- the socket close *does* happen during maintain
- this wasn't obvious from the previous log snippert, because more "Maintain" lines for other workers were left out

Some relevant lines are:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] wc_maintain::jk_worker.c (339): Maintaining worker lb-...
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpoint with sd = 152 (socket shutdown)
...
[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0 lingering bytes in 2 sec.
...
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2 [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1 cache slots [Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug] ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 seconds from 1 pool slots
...

So now we can discuss, how to fix this.

- Can we detect more reliably, whether we want to drain before closing or not

- As a workaround: using a JkWatchdog moves the maintain into a separate thread. But during the socket closing a lock is held, which blocks other threads from accessing the same worker.

We would need to rewrite the logic in ajp_maintain() to only hold the lock (critical section) while pulling sockets out of the ep_cache or entering them back, not while actually using the sockets.

I still don't know, *why* the socket close blocks, i.e. why the backend doesn't respond to us wanting to close the socket. Here a network sniff like indicated by Mladen would help.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to