Rainer,
On 10/31/25 1:08 PM, Rainer Jung wrote:
Am 31.10.25 um 18:02 schrieb Rainer Jung:
Am 31.10.25 um 09:04 schrieb Mark Thomas:
On 30/10/2025 14:06, Christopher Schultz wrote:
<snip/>
That symptom, plus the "this is the only server using a NAT gateway"
would surely point to one place: the NAT gateway is killing
connections that are idle and surprising both stunnel and mod_jk. I
can also see a graph of non-zero numbers of "Idle Timeouts" on the
NAT gateway. It doesn't tell me more details about those timeouts,
but they are almost certainly outgoing AJP/stunnel connections.
Your reasoning above looks sound to me.
But.
Here is my mod_jk workers configuration:
# Template worker
worker.template.type=ajp13
worker.template.host=localhost
worker.template.connection_pool_timeout=60
worker.template.socket_timeout=300
worker.template.max_packet_size=65536
worker.node1.reference=worker.template
worker.node1.port=7015
worker.node1.route=node1
My expectation is that connection_pool_timeout of 60 (seconds) will
close connections which have been idle for 60 seconds. If mod_jk
closes a connection, stunnel will also close that connection. (Note:
I have no explicit connectionTimeout or keepAliveTimeout on the
Tomcat side. But this doesn't seem to be any problem for the other
two web servers.)
Checking my configuration for the NAT gateway, it has a fixed idle
timeout of 350 seconds, which is much longer than the 60 seconds I
(believe I) have set for idle AJP connections.
I do not use servlet async or Websocket for anything in my
application, so I do not expect long-lasting connections between
client and server.
Is there anything I haven't checked at this point?
You might want to check how connection_pool_timeout interacts with
connection_pool_size and connection_pool_minsize.
I am wondering if the current size is at minsize will an idle
connection be closed if it exceeds the timeout?
I think the code that handles this is here:
https://github.com/apache/tomcat-connectors/blob/main/native/common/
jk_ajp_common.c#L3510
and I think it only closes idle connections until the pool reaches
minsize.
Which isn't what the docs suggest. Which in turn might explain why
you are seeing connections open longer than the NAT gateway timeout.
Rainer understands this code far better than I do. It would be good
to get his view on this. If I am right, I think we either need to
update the docs or we need to fix the code so idle connections below
minsize are closed and then re-opened to refresh them.
Mark
It has been quite a long time that I touched this code. But Mark is
right, idle connections will not get closed below the min pool size.
Most of the code default values for configuration never reflected the
code progress and were kept to keep old behavior in place. I typically
suggest to start with the config from here:
https://github.com/apache/tomcat-connectors/blob/main/conf/
workers.properties
That should have better default params (like min pool size 0) and
comments. See also the other files in that directory.
I like most of our jk timeouts and the cping/cpong, but I generally do
not like the socket_timeout, because it is a complex to understand low
level timeout which can kick in in any communication context.
In addition when doing observation/monitoring keep in mind, that the
connection pools are per child process, not per Apache httpd instance.
One other thing: the idle timeout is not tracked continuously but only
checked during the maintenance run. Generally that happens once per
minute, but hooked on request processing (no requests, no maintenance
runs), but nowadays you can (but have to) configure using the Apache
httpd watchdog thread to run it even when no requests are handled.
It looks like ping_mode=I plus the watchdog can help significantly.
What would happen if I set the connection_pool_minsize to something very
low like ... zero? This would allow a watchdog wakeup to completely
remove all connections which had been invalidated.
And, I believe, connections removed this way won't trip the
failure-detector which sets the worker into the ERR state, right?
If I set ping_mode=I and there are zero request handled by the server
for like 10 minutes, will any connection from the pool be tested as if
ping_mode=P since the idle-timeout would have passed? When we have a lot
of requests, testing the viability of a connection for every request
seems wasteful.
When cping/cpong fails, does it have any effect on the status of the
worker? In a terrible case, I might expect that all ~125 connections
have gone stale. When a new request comes in, it will fail cping and
then ... what happens? Do we just need trying connections in the pool
until we get one that works? And if the pool is exhausted we create new
connections? If so, that's great, and the behavior I'd like.
What I'm really trying to avoid is a situation where I have a connection
pool with a bunch of stale connections but new connections would
succeed, and servicing a request hits these stale connections, killing
the whole worker. Since I'm using sticky sessions, this basically
logs-out 1/6 of my users (3 web servers, 2 back-end workers) when it
happens *to a single request*.
How can I make this setup bullet-proof? I don't need to be able to
handle a million requests per second, and I don't need to worry about
complicated fail-over situations, etc. I just want to clean-out these
stale connections so they don't cause this kind of error.
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]