Hello (again), I'm trying to find the root cause of a significant number of failed connexions attempts / broken existing connections on an artemis broker.
The issue have been produced on an embedded artemis 2.10.1 and a standalone 2.16.0 (tomcat9, openjdk11) Two type of errors occurs : timeouts during handshakes and broken existing connexions. such as 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server] AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol handshake with /xxx.xxx.xxx.xxx:41760 has occurred. 2021-01-06 16:56:28,016 WARN {Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59)} [org.apache.activemq.artemis.core.client] : AMQ212037: Connection failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] Both brokers were deployed on RHEL7 with artemis-native and libaio (32 logical cores, plenty of RAM). Clients use JMS with openwire (activemq-client). The investigations on network infrastructures came up empty handed, so I'm trying to explore the possibility that something went wrong in artemis underpants. Is there a possibility that the thread pool configured with remotingThreads is too small (default values) ? The observation of the thread stack in JMX seems to expose plenty of threads happily idle. The clients are known to open and close a lot of connections (we know it's wrong, and now they know it too, but it still should work). The number of open connections is usually around 90-100 which hardly seems like an unbearable burden. Any ideas or suggestions on what to check/monitor/etc ? Regards, SL