Hey all,

I could use a push in the right direction to troubleshoot an issue!

TL;DR

After running really well for a seemingly indeterminate period of time
(from hours to days), message delivery stops to connected consumers that
are located within the same JVM as the Artemis server.  Producers in
that same JVM continue uninterrupted. (version Artemis 2.30, will
upgrade to 2.31.2 soon)

Details:
4 JVMs on each of 3 large Linux VMs. Node 1 has an additional JVM that
contains an embedded Artemis broker.  All 13 of these JVMs has an open
producer and consumer session in the broker and persistence is off.

I don't have direct access to the machines where this problem is
occurring to debug, but I can get logs and ultimately apply updates. 
Log analysis of application behavior points to cessation of message
delivery to the consumer inside the broker JVM.  All other consumers and
producers continue to pass messages through broker without issue; the
broker is running great.

I setup a similar 3 node setup that I could debug into to attempt to
replicate.  I put a breakpoint in my message handler and then following
the call stack into ClientConsumerImpl, I manually called
setMessageHandler(null) to disable the handler on the consumer as the
application was running.  The resulting application behavior and logging
on this setup then matched exactly the behavior on the problem machines,
including some pretty distinctive behaviors that the application does. 
This really leads me to believe that the message delivery stopped.

So I have no idea WHY the consumer stopped receiving messages.  I have
requested the logs for org.apache.activemq be set to INFO to capture
more information from this environment.  We normally run them at WARN
level because of the volume of logs.  I didn't really see anything
interesting in the logs I did get from the broker (at WARN level).  If
there were some kind of network issue, I don't understand how it could
not affect the producers as well -- let alone all the other 12 connected
JVMs?

Are there other normal reasons that message delivery to a consumer could
stop?  What log messages or logging can help me prove one way or another
what is happening?  The only thing unusual about these machines is that
they have 2 NICs.

Regards,

David.


Reply via email to