Getting intermittent mod_jk errors on 1 / 3 web servers

Christopher Schultz Thu, 30 Oct 2025 07:07:59 -0700

All,

I've got 3 web servers in AWS EC2 connected to a pair of back-end Tomcatinstances using mod_jk. I'm also using stunnel. So the connections looklike this:


ALB -> web server -> [AJP over stunnel] -> Tomcat AjpNioProtocol

One of these web servers is newly-built and it seems to be sufferingfrom connection errors. This is a load-balancer connection using stickysessions to the back-end nodes. The jk-status page shows e.g. a smallnumber of errors:


Name    Act     State   Err
node1   ACT     OK      5
node2   ACT     OK      6

Note: these are NOT "client errors". I pretty much always ignore those.

This number is small because I reset the balancer member stats thismorning to get a better handle on how often the errors occur. It's notall that often, but the other two web servers are pretty much neverregistering ANY errors. So this is definitely a problem I'd like to solve.


Some notable differences between the existing web servers and the new one:

1. Old web servers are x86-64 based, the new one is aarch64
2. New web server goes through an AWS NAT gateway for IPv4

The new web server has only IPv6 public, and uses a NAT gateway to getout to the internet over IPv4. The other two web servers have publicIPv4 addresses. Neither back-end server has IPv6, so all communicationfor AJP/stunnel is over IPv4.


The mod_jk log contains logs like these:

[Thu Oct 30 13:46:15.893 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [info]ajp_connection_tcp_get_message::jk_ajp_common.c (1376): (node1) can'treceive the response header message from tomcat, network problems ortomcat (127.0.0.1:7015) is down (errno=104)

[Thu Oct 30 13:46:15.893 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [error] ajp_get_reply::jk_ajp_common.c (2346):(node1) Tomcat is down or refused connection. No response has been sentto the client (yet)

[Thu Oct 30 13:46:15.893 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [info] ajp_service::jk_ajp_common.c (2892):(node1) sending request to tomcat failed (recoverable), (attempt=1)

[Thu Oct 30 13:46:15.995 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [info]ajp_connection_tcp_get_message::jk_ajp_common.c (1376): (node1) can'treceive the response header message from tomcat, network problems ortomcat (127.0.0.1:7015) is down (errno=104)

[Thu Oct 30 13:46:15.995 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [error] ajp_get_reply::jk_ajp_common.c (2346):(node1) Tomcat is down or refused connection. No response has been sentto the client (yet)

[Thu Oct 30 13:46:15.995 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [info] ajp_service::jk_ajp_common.c (2892):(node1) sending request to tomcat failed (recoverable), (attempt=2)

[Thu Oct 30 13:46:15.995 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [error] ajp_service::jk_ajp_common.c (2913):(node1) connecting to tomcat failed (rc=0, errors=6, client_errors=2).

[Thu Oct 30 13:46:15.997 2025] [aQNsJ08YUCb0jK7JrtZe3gAAAAA][1447:281472310505664] [info] service::jk_lb_worker.c (1602): servicefailed, worker node1 is in local error state


127.0.0.1:7015 is the port number where stunnel is listening.

stunnel contains these logs:

Oct 30 13:46:15 ip-10-2-0-166.ec2.internal stunnel[1444]: LOG3[423]:SSL_read: ssl/record/rec_layer_s3.c:689: error:0A000126:SSLroutines::unexpected eof while reading

Oct 30 13:46:15 ip-10-2-0-166.ec2.internal stunnel[1444]: LOG3[373]:SSL_read: ssl/record/rec_layer_s3.c:689: error:0A000126:SSLroutines::unexpected eof while reading

Oct 30 13:46:15 ip-10-2-0-166.ec2.internal stunnel[1444]: LOG3[375]:SSL_read: ssl/record/rec_layer_s3.c:689: error:0A000126:SSLroutines::unexpected eof while reading

Oct 30 13:46:15 ip-10-2-0-166.ec2.internal stunnel[1444]: LOG3[374]:SSL_read: ssl/record/rec_layer_s3.c:689: error:0A000126:SSLroutines::unexpected eof while reading

Given that these timestamps are correlated, it seems that they arereporting the same event.

When stunnel reports "unexpected eof" it typically means that the remoteserver (or some network gear) closed the connection without tearing-downthe TLS connection cleanly.

That symptom, plus the "this is the only server using a NAT gateway"would surely point to one place: the NAT gateway is killing connectionsthat are idle and surprising both stunnel and mod_jk. I can also see agraph of non-zero numbers of "Idle Timeouts" on the NAT gateway. Itdoesn't tell me more details about those timeouts, but they are almostcertainly outgoing AJP/stunnel connections.


But.

Here is my mod_jk workers configuration:

# Template worker
worker.template.type=ajp13
worker.template.host=localhost
worker.template.connection_pool_timeout=60
worker.template.socket_timeout=300
worker.template.max_packet_size=65536

worker.node1.reference=worker.template
worker.node1.port=7015
worker.node1.route=node1

My expectation is that connection_pool_timeout of 60 (seconds) willclose connections which have been idle for 60 seconds. If mod_jk closesa connection, stunnel will also close that connection. (Note: I have noexplicit connectionTimeout or keepAliveTimeout on the Tomcat side. Butthis doesn't seem to be any problem for the other two web servers.)

Checking my configuration for the NAT gateway, it has a fixed idletimeout of 350 seconds, which is much longer than the 60 seconds I(believe I) have set for idle AJP connections.

I do not use servlet async or Websocket for anything in my application,so I do not expect long-lasting connections between client and server.


Is there anything I haven't checked at this point?

-chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Getting intermittent mod_jk errors on 1 / 3 web servers

Reply via email to