Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Christian Kurmann Wed, 19 Feb 2025 11:41:20 -0800

Hi Justin.
Yes, and I’m sorry it didn't work out for us.
The problem was a combination of using operators on openshift which among
other things make it difficult to change settings due to the internal
translation of attributes to actual config values in the running pods, and
the issues I alluded to earlier where by we got several variations of
bottlenecks that interacted to soft lock the system.
Very often the disks run full but this is preceded by a sudden drop in
performance. I believe our microservice architecture where processes read
from a queue and then write to another queue on the same activeMQ is a bit
of an anti pattern for the assumptions Artemis has for its throttling


We had a consistent issue with dead connections that didn’t show up on the
Artemis GUI and only became noticeable in the logs (and in failed
connection attempts by clients) but we never made any headway into finding
out why they exist or how to close them other than by restarting and
«solved» the issue simply by removing limits. Not sure if this was a cause
or a symptom or just another open issue.

We tried several versions of Artemis and drivers (both out original open
wire and a rewritten setup with Core protocol) and paid redhat support
without being able to get rid of the crashes. We did see different patterns
in how fast and how bad it crashed with Openwire vs core. Core was
definitely faster and held out longer but the. Crashed in a worse way.
Again i assume just an artifact of our data load being too much for Artemis
to handle gracefully.

Switching back to ActiveMq classic resolved the issues for us in so far as
we managed to pass all of our performance tests and its been running in
several production sites for months now without the previous issues.
It could be that it’s just luck in that some things are a little slower on
ActiveMq Classic letting us process all our data successfully in an overall
faster time because we don’t hit the same bottlenecks or throttling.

Sorry, long email without any real information. I did try sending logs and
more specific details earlier though, especially to redhat.

Overall thanks for asking and I massively appreciate the work you guys all
do on the software and here on the help thread

On Wed, 19 Feb 2025 at 20:20, Justin Bertram <[email protected]> wrote:

> I saw your email on the "Are there any hardware recommendations for
> ActiveMQ Classic?" about performance issue you observed with Artemis. Is
> this the issue you were referring to? If so, did you try the configuration
> change I mentioned? Any follow-up you could provide here would be valuable.
>
>
> Justin
>
> On Fri, May 10, 2024 at 6:51 AM Christian Kurmann <[email protected]> wrote:
>
> > Hi all,
> > Would someone be able to guide me to documentation for options to handle
> a
> > problem I've encountered testing Artemis > 2.30.
> >
> > I have a set of ~50 java microservices processing 1.5million messages
> which
> > communicate among themselves via Artemis using Openwire.
> > To test I have a dataset containing lots of small and a realistic amount
> of
> > large messages (> 10KB) which simulates a full prod day but can be
> > processed in under 1h.
> >
> > Using Artemis 2.30 I can get my data processed, however with later
> versions
> > the system hangs due to a large backlog on an important Queue which is
> used
> > by all services to write trace data.
> > All microservices process data in batches of max 100msgs while reading
> and
> > writing to the queues.
> >
> > The main issue is that when this happens, both the reading and writing
> > clients of this queue simply hang waiting for Artemis to return ACK which
> > never comes.
> > During this time Artemis does not log anything suspicious. However > 10
> > minutes later I do see paging starting and I get the warning:
> >
> > 2024-05-10 09:26:12,605 INFO [io.hawt.web.auth.LoginServlet] Hawtio login
> > is using 1800 sec. HttpSession timeout
> > 2024-05-10 09:26:12,614 INFO [io.hawt.web.auth.LoginServlet] Logging in
> > user: webadmin
> > 2024-05-10 09:26:12,906 INFO [io.hawt.web.auth.keycloak.KeycloakServlet]
> > Keycloak integration is disabled
> > 2024-05-10 09:26:12,955 INFO [io.hawt.web.proxy.ProxyServlet] Proxy
> servlet
> > is disabled
> > 2024-05-10 09:48:56,955 INFO [org.apache.activemq.artemis.core.server]
> > AMQ222038: Starting paging on address 'IMS.PRINTS.V2'; size=10280995859
> > bytes (1820178 messages); maxSize=-1 bytes (-1 messages);
> > globalSize=10309600627 bytes (1825169 messages);
> globalMaxSize=10309599232
> > bytes (-1 messages);
> > 2024-05-10 09:48:56,962 WARN
> > [org.apache.activemq.artemis.core.server.Queue]
> > AMQ224127: Message dispatch from paging is blocked. Address
> > IMS.PRINTS.V2/Queue IMS.PRINTS.V2 will not read any more messages from
> > paging until pending messages are acknowledged. There are currently 14500
> > messages pending (51829364 bytes) with max reads at
> maxPageReadMessages(-1)
> > and maxPageReadBytes(20971520). Either increase reading attributes at the
> > address-settings or change your consumers to acknowledge more often.
> > 2024-05-10 09:49:24,458 INFO [org.apache.activemq.artemis.core.server]
> > AMQ222038: Starting paging on address 'PRINTS'; size=28608213 bytes (4992
> > messages); maxSize=-1 bytes (-1 messages); globalSize=10309604144 bytes
> > (1825170 messages); globalMaxSize=10309599232 bytes (-1 messages);
> >
> > I see in the commit that added this warning that the maxPageReadBytes is
> > not something I can actually change so I assume any solution needs to
> > happen earlier in the chain
> > https://www.mail-archive.com/[email protected]/msg61667.html
> >
> > But I can't find anything sensible to configure to help me here.
> > I also am concerned about this suddenly appearing with a new Artemis
> > version and I can reproduce the crash with my dataset as well as verify
> > that with 2.30 it still works fine.
> > The fact that there is nothing in the logs also worries me.
> > Reconnecting the clients allows messages to be processed however the
> system
> > crashes again very quickly due to all the pending transactions starting
> > again and leading to the same issue.
> >
> > Running on Openshift using AMQ Cloud operator.
> > Clients connect via Openwire and using activemq-client-5.17.2.jar and
> > java17
> >
> > All input is appreciated and I'm happy to run as many tests as required.
> >
> > Regards Chris
> >
>

Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Reply via email to