Hey all, I could use a push in the right direction to troubleshoot an issue!
TL;DR After running really well for a seemingly indeterminate period of time (from hours to days), message delivery stops to connected consumers that are located within the same JVM as the Artemis server. Producers in that same JVM continue uninterrupted. (version Artemis 2.30, will upgrade to 2.31.2 soon) Details: 4 JVMs on each of 3 large Linux VMs. Node 1 has an additional JVM that contains an embedded Artemis broker. All 13 of these JVMs has an open producer and consumer session in the broker and persistence is off. I don't have direct access to the machines where this problem is occurring to debug, but I can get logs and ultimately apply updates. Log analysis of application behavior points to cessation of message delivery to the consumer inside the broker JVM. All other consumers and producers continue to pass messages through broker without issue; the broker is running great. I setup a similar 3 node setup that I could debug into to attempt to replicate. I put a breakpoint in my message handler and then following the call stack into ClientConsumerImpl, I manually called setMessageHandler(null) to disable the handler on the consumer as the application was running. The resulting application behavior and logging on this setup then matched exactly the behavior on the problem machines, including some pretty distinctive behaviors that the application does. This really leads me to believe that the message delivery stopped. So I have no idea WHY the consumer stopped receiving messages. I have requested the logs for org.apache.activemq be set to INFO to capture more information from this environment. We normally run them at WARN level because of the volume of logs. I didn't really see anything interesting in the logs I did get from the broker (at WARN level). If there were some kind of network issue, I don't understand how it could not affect the producers as well -- let alone all the other 12 connected JVMs? Are there other normal reasons that message delivery to a consumer could stop? What log messages or logging can help me prove one way or another what is happening? The only thing unusual about these machines is that they have 2 NICs. Regards, David.