I've got a few questions:

 - Are the other working consumers on the same queue as the stalled
consumer?
 - Can you get metrics from the queue with the stalled consumer? If so,
what's the value of the MessageCount, ConsumerCount, DeliveringCount, &
Paused attributes?
 - Have you acquired any thread dumps from the stalled consumer? If so,
what did they show?
 - What kinds of clients are you using? What version are they?

> Are there other normal reasons that message delivery to a consumer could
stop?

Typically what I see in this kind of situation is that the consumer is hung
for some reason while attempting to handle a message (e.g. a blocking call
without a timeout to a remote resource like a REST API or something).

> What log messages or logging can help me prove one way or another what is
happening?

It's impossible to say at this point without more knowledge of what
protocol(s) your clients are using.


Justin

On Tue, Nov 7, 2023 at 7:52 PM David Bennion <david.benn...@gmx.com.invalid>
wrote:

> Hey all,
>
> I could use a push in the right direction to troubleshoot an issue!
>
> TL;DR
>
> After running really well for a seemingly indeterminate period of time
> (from hours to days), message delivery stops to connected consumers that
> are located within the same JVM as the Artemis server.  Producers in
> that same JVM continue uninterrupted. (version Artemis 2.30, will
> upgrade to 2.31.2 soon)
>
> Details:
> 4 JVMs on each of 3 large Linux VMs. Node 1 has an additional JVM that
> contains an embedded Artemis broker.  All 13 of these JVMs has an open
> producer and consumer session in the broker and persistence is off.
>
> I don't have direct access to the machines where this problem is
> occurring to debug, but I can get logs and ultimately apply updates.
> Log analysis of application behavior points to cessation of message
> delivery to the consumer inside the broker JVM.  All other consumers and
> producers continue to pass messages through broker without issue; the
> broker is running great.
>
> I setup a similar 3 node setup that I could debug into to attempt to
> replicate.  I put a breakpoint in my message handler and then following
> the call stack into ClientConsumerImpl, I manually called
> setMessageHandler(null) to disable the handler on the consumer as the
> application was running.  The resulting application behavior and logging
> on this setup then matched exactly the behavior on the problem machines,
> including some pretty distinctive behaviors that the application does.
> This really leads me to believe that the message delivery stopped.
>
> So I have no idea WHY the consumer stopped receiving messages.  I have
> requested the logs for org.apache.activemq be set to INFO to capture
> more information from this environment.  We normally run them at WARN
> level because of the volume of logs.  I didn't really see anything
> interesting in the logs I did get from the broker (at WARN level).  If
> there were some kind of network issue, I don't understand how it could
> not affect the producers as well -- let alone all the other 12 connected
> JVMs?
>
> Are there other normal reasons that message delivery to a consumer could
> stop?  What log messages or logging can help me prove one way or another
> what is happening?  The only thing unusual about these machines is that
> they have 2 NICs.
>
> Regards,
>
> David.
>
>
>

Reply via email to