Re: Getting intermittent mod_jk errors on 1 / 3 web servers

Christopher Schultz Fri, 31 Oct 2025 12:08:18 -0700

Rainer,

On 10/31/25 1:08 PM, Rainer Jung wrote:

Am 31.10.25 um 18:02 schrieb Rainer Jung:
Am 31.10.25 um 09:04 schrieb Mark Thomas:
On 30/10/2025 14:06, Christopher Schultz wrote:
<snip/>
That symptom, plus the "this is the only server using a NAT gateway"would surely point to one place: the NAT gateway is killingconnections that are idle and surprising both stunnel and mod_jk. Ican also see a graph of non-zero numbers of "Idle Timeouts" on theNAT gateway. It doesn't tell me more details about those timeouts,but they are almost certainly outgoing AJP/stunnel connections.
Your reasoning above looks sound to me.
But.

Here is my mod_jk workers configuration:

# Template worker
worker.template.type=ajp13
worker.template.host=localhost
worker.template.connection_pool_timeout=60
worker.template.socket_timeout=300
worker.template.max_packet_size=65536

worker.node1.reference=worker.template
worker.node1.port=7015
worker.node1.route=node1
My expectation is that connection_pool_timeout of 60 (seconds) willclose connections which have been idle for 60 seconds. If mod_jkcloses a connection, stunnel will also close that connection. (Note:I have no explicit connectionTimeout or keepAliveTimeout on theTomcat side. But this doesn't seem to be any problem for the othertwo web servers.)
Checking my configuration for the NAT gateway, it has a fixed idletimeout of 350 seconds, which is much longer than the 60 seconds I(believe I) have set for idle AJP connections.
I do not use servlet async or Websocket for anything in myapplication, so I do not expect long-lasting connections betweenclient and server.
Is there anything I haven't checked at this point?
You might want to check how connection_pool_timeout interacts withconnection_pool_size and connection_pool_minsize.
I am wondering if the current size is at minsize will an idleconnection be closed if it exceeds the timeout?
I think the code that handles this is here:
https://github.com/apache/tomcat-connectors/blob/main/native/common/jk_ajp_common.c#L3510
and I think it only closes idle connections until the pool reachesminsize.
Which isn't what the docs suggest. Which in turn might explain whyyou are seeing connections open longer than the NAT gateway timeout.
Rainer understands this code far better than I do. It would be goodto get his view on this. If I am right, I think we either need toupdate the docs or we need to fix the code so idle connections belowminsize are closed and then re-opened to refresh them.
Mark
It has been quite a long time that I touched this code. But Mark isright, idle connections will not get closed below the min pool size.
Most of the code default values for configuration never reflected thecode progress and were kept to keep old behavior in place. I typicallysuggest to start with the config from here:
https://github.com/apache/tomcat-connectors/blob/main/conf/workers.properties
That should have better default params (like min pool size 0) andcomments. See also the other files in that directory.
I like most of our jk timeouts and the cping/cpong, but I generally donot like the socket_timeout, because it is a complex to understand lowlevel timeout which can kick in in any communication context.
In addition when doing observation/monitoring keep in mind, that theconnection pools are per child process, not per Apache httpd instance.
One other thing: the idle timeout is not tracked continuously but onlychecked during the maintenance run. Generally that happens once perminute, but hooked on request processing (no requests, no maintenanceruns), but nowadays you can (but have to) configure using the Apachehttpd watchdog thread to run it even when no requests are handled.


It looks like ping_mode=I plus the watchdog can help significantly.

What would happen if I set the connection_pool_minsize to something verylow like ... zero? This would allow a watchdog wakeup to completelyremove all connections which had been invalidated.

And, I believe, connections removed this way won't trip thefailure-detector which sets the worker into the ERR state, right?

If I set ping_mode=I and there are zero request handled by the serverfor like 10 minutes, will any connection from the pool be tested as ifping_mode=P since the idle-timeout would have passed? When we have a lotof requests, testing the viability of a connection for every requestseems wasteful.

When cping/cpong fails, does it have any effect on the status of theworker? In a terrible case, I might expect that all ~125 connectionshave gone stale. When a new request comes in, it will fail cping andthen ... what happens? Do we just need trying connections in the pooluntil we get one that works? And if the pool is exhausted we create newconnections? If so, that's great, and the behavior I'd like.

What I'm really trying to avoid is a situation where I have a connectionpool with a bunch of stale connections but new connections wouldsucceed, and servicing a request hits these stale connections, killingthe whole worker. Since I'm using sticky sessions, this basicallylogs-out 1/6 of my users (3 web servers, 2 back-end workers) when ithappens *to a single request*.

How can I make this setup bullet-proof? I don't need to be able tohandle a million requests per second, and I don't need to worry aboutcomplicated fail-over situations, etc. I just want to clean-out thesestale connections so they don't cause this kind of error.


-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Getting intermittent mod_jk errors on 1 / 3 web servers

Reply via email to