Re: 2 second delays in mod_jk while "maintaining workers"

Rainer Jung Tue, 07 Sep 2010 05:52:05 -0700

On 07.09.2010 12:40, Rainer Jung wrote:

On 07.09.2010 12:32, John Baker wrote:

Rainer,


On Tuesday 07 September 2010 11:09:46 you wrote:

I don't like "socket_timeout" ...

worker.basic.socket_timeout=90


but I would like socket_connect_timeout.

The next two are possibly a bit short, because if the backend e.g. does
a Java Garbage Collection which miht take longer than 1 second, tose
timeouts will fire and take the node out of load balancing.

worker.basic.connect_timeout=1000
worker.basic.prepost_timeout=1000


I've change socket_timeout=90 to socket_connect_timeout=9000, and
connect_timeout/prepost_timeout to 5000, and the problem persists:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152
[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]
jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0
lingering bytes in 2 sec.
[Tue Sep 07 11:29:24.868 2010] [364:47921619424608] [debug]
wc_maintain::jk_worker.c (339): Maintaining worker lb-jboss51-integration


OK, so it would be helpful to see a more complete log, e.g.

grep 364:47921619424608 LOGFILE | egrep -e '11:29:2[012345]'

If there's data in it which is not meant for public posting, you can
also send it to me va direct mail.


Thanks for the log snippets. I can now see:

- the socket close *does* happen during maintain

- this wasn't obvious from the previous log snippert, because more"Maintain" lines for other workers were left out


Some relevant lines are:

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]wc_maintain::jk_worker.c (339): Maintaining worker lb-...

...

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]ajp_reset_endpoint::jk_ajp_common.c (757): (worker1) resetting endpointwith sd = 152 (socket shutdown)

...

[Tue Sep 07 11:29:22.863 2010] [364:47921619424608] [debug]jk_shutdown_socket::jk_connect.c (722): About to shutdown socket 152

...

[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]jk_shutdown_socket::jk_connect.c (813): Shutdown socket 152 and read 0lingering bytes in 2 sec.

...

[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]ajp_maintain::jk_ajp_common.c (3190): cleaning pool slot=0 elapsed 185 in 2[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]ajp_maintain::jk_ajp_common.c (3197): reached pool min size 1 from 1cache slots[Tue Sep 07 11:29:24.863 2010] [364:47921619424608] [debug]ajp_maintain::jk_ajp_common.c (3237): recycled 1 sockets in 2 secondsfrom 1 pool slots

...

So now we can discuss, how to fix this.

- Can we detect more reliably, whether we want to drain before closingor not

- As a workaround: using a JkWatchdog moves the maintain into a separatethread. But during the socket closing a lock is held, which blocks otherthreads from accessing the same worker.

We would need to rewrite the logic in ajp_maintain() to only hold thelock (critical section) while pulling sockets out of the ep_cache orentering them back, not while actually using the sockets.

I still don't know, *why* the socket close blocks, i.e. why the backenddoesn't respond to us wanting to close the socket. Here a network snifflike indicated by Mladen would help.


Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: 2 second delays in mod_jk while "maintaining workers"

Reply via email to