Rainer,

On 10/31/25 1:08 PM, Rainer Jung wrote:
Am 31.10.25 um 18:02 schrieb Rainer Jung:
Am 31.10.25 um 09:04 schrieb Mark Thomas:
On 30/10/2025 14:06, Christopher Schultz wrote:

<snip/>

That symptom, plus the "this is the only server using a NAT gateway" would surely point to one place: the NAT gateway is killing connections that are idle and surprising both stunnel and mod_jk. I can also see a graph of non-zero numbers of "Idle Timeouts" on the NAT gateway. It doesn't tell me more details about those timeouts, but they are almost certainly outgoing AJP/stunnel connections.

Your reasoning above looks sound to me.

But.

Here is my mod_jk workers configuration:

# Template worker
worker.template.type=ajp13
worker.template.host=localhost
worker.template.connection_pool_timeout=60
worker.template.socket_timeout=300
worker.template.max_packet_size=65536

worker.node1.reference=worker.template
worker.node1.port=7015
worker.node1.route=node1

My expectation is that connection_pool_timeout of 60 (seconds) will close connections which have been idle for 60 seconds. If mod_jk closes a connection, stunnel will also close that connection. (Note: I have no explicit connectionTimeout or keepAliveTimeout on the Tomcat side. But this doesn't seem to be any problem for the other two web servers.)

Checking my configuration for the NAT gateway, it has a fixed idle timeout of 350 seconds, which is much longer than the 60 seconds I (believe I) have set for idle AJP connections.

I do not use servlet async or Websocket for anything in my application, so I do not expect long-lasting connections between client and server.

Is there anything I haven't checked at this point?

You might want to check how connection_pool_timeout interacts with connection_pool_size and connection_pool_minsize.

I am wondering if the current size is at minsize will an idle connection be closed if it exceeds the timeout?

I think the code that handles this is here:

https://github.com/apache/tomcat-connectors/blob/main/native/common/ jk_ajp_common.c#L3510

and I think it only closes idle connections until the pool reaches minsize.

Which isn't what the docs suggest. Which in turn might explain why you are seeing connections open longer than the NAT gateway timeout.

Rainer understands this code far better than I do. It would be good to get his view on this. If I am right, I think we either need to update the docs or we need to fix the code so idle connections below minsize are closed and then re-opened to refresh them.

Mark

It has been quite a long time that I touched this code. But Mark is right, idle connections will not get closed below the min pool size.

Most of the code default values for configuration never reflected the code progress and were kept to keep old behavior in place. I typically suggest to start with the config from here:

https://github.com/apache/tomcat-connectors/blob/main/conf/ workers.properties

That should have better default params (like min pool size 0) and comments. See also the other files in that directory.

I like most of our jk timeouts and the cping/cpong, but I generally do not like the socket_timeout, because it is a complex to understand low level timeout which can kick in in any communication context.

In addition when doing observation/monitoring keep in mind, that the connection pools are per child process, not per Apache httpd instance.

One other thing: the idle timeout is not tracked continuously but only checked during the maintenance run. Generally that happens once per minute, but hooked on request processing (no requests, no maintenance runs), but nowadays you can (but have to) configure using the Apache httpd watchdog thread to run it even when no requests are handled.

It looks like ping_mode=I plus the watchdog can help significantly.

What would happen if I set the connection_pool_minsize to something very low like ... zero? This would allow a watchdog wakeup to completely remove all connections which had been invalidated.

And, I believe, connections removed this way won't trip the failure-detector which sets the worker into the ERR state, right?

If I set ping_mode=I and there are zero request handled by the server for like 10 minutes, will any connection from the pool be tested as if ping_mode=P since the idle-timeout would have passed? When we have a lot of requests, testing the viability of a connection for every request seems wasteful.

When cping/cpong fails, does it have any effect on the status of the worker? In a terrible case, I might expect that all ~125 connections have gone stale. When a new request comes in, it will fail cping and then ... what happens? Do we just need trying connections in the pool until we get one that works? And if the pool is exhausted we create new connections? If so, that's great, and the behavior I'd like.

What I'm really trying to avoid is a situation where I have a connection pool with a bunch of stale connections but new connections would succeed, and servicing a request hits these stale connections, killing the whole worker. Since I'm using sticky sessions, this basically logs-out 1/6 of my users (3 web servers, 2 back-end workers) when it happens *to a single request*.

How can I make this setup bullet-proof? I don't need to be able to handle a million requests per second, and I don't need to worry about complicated fail-over situations, etc. I just want to clean-out these stale connections so they don't cause this kind of error.

-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to