Mark,

On 10/31/25 4:04 AM, Mark Thomas wrote:
On 30/10/2025 14:06, Christopher Schultz wrote:

<snip/>

That symptom, plus the "this is the only server using a NAT gateway" would surely point to one place: the NAT gateway is killing connections that are idle and surprising both stunnel and mod_jk. I can also see a graph of non-zero numbers of "Idle Timeouts" on the NAT gateway. It doesn't tell me more details about those timeouts, but they are almost certainly outgoing AJP/stunnel connections.

Your reasoning above looks sound to me.

Thanks for the second brain's opinion on this.

But.

Here is my mod_jk workers configuration:

# Template worker
worker.template.type=ajp13
worker.template.host=localhost
worker.template.connection_pool_timeout=60
worker.template.socket_timeout=300
worker.template.max_packet_size=65536

worker.node1.reference=worker.template
worker.node1.port=7015
worker.node1.route=node1

My expectation is that connection_pool_timeout of 60 (seconds) will close connections which have been idle for 60 seconds. If mod_jk closes a connection, stunnel will also close that connection. (Note: I have no explicit connectionTimeout or keepAliveTimeout on the Tomcat side. But this doesn't seem to be any problem for the other two web servers.)

Checking my configuration for the NAT gateway, it has a fixed idle timeout of 350 seconds, which is much longer than the 60 seconds I (believe I) have set for idle AJP connections.

I do not use servlet async or Websocket for anything in my application, so I do not expect long-lasting connections between client and server.

Is there anything I haven't checked at this point?

You might want to check how connection_pool_timeout interacts with connection_pool_size and connection_pool_minsize.

My current connection_pool_size and connection_pool_minsize are the defaults, so I suspect they will be:

connection_pool_size = ThreadsPerChild = 25

connection_pool_minsize = 25 + 1 / 2 = 13

So, basically, default everything.

Looking at ps, I see that there are 6 httpd processes running as the apache user, and one running as root (the control process).

The mod_status page says I have 124 idle workers and 1 in-flight request. So believe I do in fact have the default configuration of 25 threads per process; with 6 processes I have 125 total threads.

netstat tells me that I currently have 112 ESTABLISHED connections owned by stunnel.

I am wondering if the current size is at minsize will an idle connection be closed if it exceeds the timeout?

I think the code that handles this is here:

https://github.com/apache/tomcat-connectors/blob/main/native/common/ jk_ajp_common.c#L3510
>
and I think it only closes idle connections until the pool reaches
> minsize.

I'll have to do a lot of reading to determine how that's going to behave. So many configuration properties have changed names over time, it's not 100% clear how the code will behave, but ...

https://github.com/apache/tomcat-connectors/blob/main/native/common/jk_ajp_common.c#L3567

This certainly seems like it will only "clean" connections until it gets to the min size.

Which isn't what the docs suggest. Which in turn might explain why you are seeing connections open longer than the NAT gateway timeout.

Rainer understands this code far better than I do. It would be good to get his view on this. If I am right, I think we either need to update the docs or we need to fix the code so idle connections below minsize are closed and then re-opened to refresh them.
So it looks like I have a few options for immediate relief, here, though if there is a bug (or missing feature) in mod_jk, then I think that's the best long-term solution for me: I'd like AJP connections that have been open for "too long" to simply be recycled.

1. Set connection_pool_minsize=0. This seems ... potentially problematic?

2. Set socket_keepalive on the worker. The documentation is not encouraging, as it suggests that the keepalives may be send on the order of hours and not minutes.

3. Use stunnel's TIMEOUTidle=300 (or similar) to get stunnel to kill idle connections after too much time has passed (but less than the NAT idle timeout). I think this just moves the problem from the NAT router surprising stunnel and mod_jk to stunnel surprising mod_jk.

4. Use stunnel's keepalive capabilities.

socket = l:SO_KEEPALIVE=1
socket = r:SO_KEEPALIVE=1

I believe this is similar to mod_jk's keepalives, which rely on the OS. Modifying the global OS settings for how keepalives behave doesn't feel right, so I'm going to drop these two options from my list.

5. Set Tomcat's keepAliveTimeout instead of leaving it at the default. I think this will close the connection but mod_jk won't know about it until it attempts another write. (Right?)

4. Use ping_mode=P or ping_mode=I with an appropriate connection_ping_interval and ping_timeout. This seems the most promising, because it will probe the connection before using it.

This kind of happens already, because when mod_jk attempts to use a connection and discovers it's unusable, it will roll-over to another connection. But I think it will only do it 3 times before it (1) gives up and (2) marks the worker as ERR.

If I use ping_mode=P or I, will this change the above behavior at all? I would like mod_jk in this case to ping each connection, discover it's unusable, and recycle it but not mark the worker as being in the ERR state, even if mod_jk needs to recycle a whole bunch of connections in order to find one that works.

Are there any other options I haven't identified above?

I have other nuclear options, including allocating a public IP address for this instance and using that directly. My goal was actually to push everything behind this NAT router for a couple of reasons, so I would like to figure out how to get it working.

As the subject says, I do have 2 other web servers, so I can experiment with this one, including putting it in debug/trace log mode, removing it from the hw load-balancer, poking it with traffic only generated by me, etc.

-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to