Hello,

Even though the application mentioned below is a Spring Boot 3 application, I 
am looking for Apache Tomcat help here as my question involves understanding 
trace records from the org.apache.tomcat.util.net.NioEndpoint class.

I have a Spring Boot 3 application using Apache Tomcat 10.1.20 that is having 
web browser clocking issues when I run it at openjdk 17 and on a certain type 
of Linux server. I say certain type, because the same Spring Boot 3 application 
at openjdk 17 runs fine on other Linux servers that are at the same release as 
the server with the issue. One main difference between the servers that work 
and do not is different security requirements.

After running a Spring Boot trace, I think I might have narrowed the problem to 
a defect with the openjdk 17 synchronization logic and how it is interacting 
with this specific Linux server. I was hoping someone with Apache Tomcat 
knowledge in the nio-exec logic could help confirm my suspicion.

I recreated the web browser clocking issue with the following https socket 
connections from the web browser to the Spring Boot web server with the Spring 
Boot trace running (i.e. add --trace to the command line):

10.10.10.14 (Web Browser)

10.10.10.29 (Spring Boot web server)

10.10.10.14:42149 -> 10.10.10.29:9443

10.10.10.14:26045 -> 10.10.10.29:9443

In the Spring Boot trace, I only see these records for the 42149 and 26045 
ports:

2024-08-13T08:14:03.835-05:00 DEBUG 2117389 --- [nio-9443-exec-7] 
org.apache.tomcat.util.net.NioEndpoint : Registered read interest for 
[org.apache.tomcat.util.net.NioEndpoint 
$NioSocketWrapper@3c3b81b1:org.apache.tomcat.util.net.SecureNioChannel@a2e78b4:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:42149]]
2024-08-13T08:14:03.836-05:00 DEBUG 2117389 --- [nio-9443-exec-8] 
org.apache.tomcat.util.net.NioEndpoint : Registered read interest for 
[org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3c3b81b1:org.apache.tomcat.util.net.SecureNioChannel@a2e78b4:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:42149]]
2024-08-13T08:14:03.836-05:00 DEBUG 2117389 --- [nio-9443-exec-9] 
org.apache.tomcat.util.net.NioEndpoint : Registered read interest for 
[org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3c3b81b1:org.apache.tomcat.util.net.SecureNioChannel@a2e78b4:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:42149]]

2024-08-13T08:15:03.857-05:00 DEBUG 2117389 --- [nio-9443-exec-5] 
org.apache.tomcat.util.net.NioEndpoint : Calling 
[org.apache.tomcat.util.net.NioEndpoint@3b018c7e].closeSocket([org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@3c3b81b1:org.apache.tomcat.util.net.SecureNioChannel@a2e78b4:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:42149]])

2024-08-13T08:14:03.599-05:00 DEBUG 2117389 --- [nio-9443-exec-5] 
org.apache.tomcat.util.net.NioEndpoint : Registered read interest for 
[org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@2f0318da:org.apache.tomcat.util.net.SecureNioChannel@2b8cf37c:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:26045]]
2024-08-13T08:14:03.599-05:00 DEBUG 2117389 --- [nio-9443-exec-6] 
org.apache.tomcat.util.net.NioEndpoint : Registered read interest for 
[org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@2f0318da:org.apache.tomcat.util.net.SecureNioChannel@2b8cf37c:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:26045]]

2024-08-13T08:15:03.857-05:00 DEBUG 2117389 --- [nio-9443-exec-4] 
org.apache.tomcat.util.net.NioEndpoint : Calling 
[org.apache.tomcat.util.net.NioEndpoint@3b018c7e].closeSocket([org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@2f0318da:org.apache.tomcat.util.net.SecureNioChannel@2b8cf37c:java.nio.channels.SocketChannel[connected
 local=/10.10.10.29:9443 remote=/10.10.10.14:26045]])

When I compare this trace data to https sockets that work, I see this 
difference.

https sockets that work:

There is only one nio-exec thread that registers read interest, there are 
nio-exec threads that register write interest, and then there is reading data 
from the socket.

https sockets that have the web browser clocking issue (listed in trace records 
above):

There are multiple nio-exec threads that register read interest within the 1 
millisecond range. There is then no write interest registration or reading from 
the socket, and the socket is closed 60 seconds later with the default 
server.tomcat.keep-alive-timeout setting.

My hypothesis here is that there is some type of openjdk 17 synchronization 
defect that is allowing multiple nio-exec threads to read register within the 
same millisecond. This then causes the nio-exec logic to get into a state where 
it stalls and stops reading from the https socket. Basically, it is not an 
expected state for multiple nio-exec threads to be able to read register on a 
socket within the same millisecond. Can someone with Apache Tomcat nio-exec 
knowledge let me know if I am interpreting this trace data correctly?

Thanks,
Tim

Reply via email to