Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Justin Bertram Wed, 19 Feb 2025 11:44:48 -0800

To be clear, everything worked in Artemis 2.30.0 and then began failing
after an upgrade?



Justin

On Wed, Feb 19, 2025 at 1:40 PM Christian Kurmann <[email protected]> wrote:

> Hi Justin.
> Yes, and I’m sorry it didn't work out for us.
> The problem was a combination of using operators on openshift which among
> other things make it difficult to change settings due to the internal
> translation of attributes to actual config values in the running pods, and
> the issues I alluded to earlier where by we got several variations of
> bottlenecks that interacted to soft lock the system.
> Very often the disks run full but this is preceded by a sudden drop in
> performance. I believe our microservice architecture where processes read
> from a queue and then write to another queue on the same activeMQ is a bit
> of an anti pattern for the assumptions Artemis has for its throttling
>
> We had a consistent issue with dead connections that didn’t show up on the
> Artemis GUI and only became noticeable in the logs (and in failed
> connection attempts by clients) but we never made any headway into finding
> out why they exist or how to close them other than by restarting and
> «solved» the issue simply by removing limits. Not sure if this was a cause
> or a symptom or just another open issue.
>
> We tried several versions of Artemis and drivers (both out original open
> wire and a rewritten setup with Core protocol) and paid redhat support
> without being able to get rid of the crashes. We did see different patterns
> in how fast and how bad it crashed with Openwire vs core. Core was
> definitely faster and held out longer but the. Crashed in a worse way.
> Again i assume just an artifact of our data load being too much for Artemis
> to handle gracefully.
>
> Switching back to ActiveMq classic resolved the issues for us in so far as
> we managed to pass all of our performance tests and its been running in
> several production sites for months now without the previous issues.
> It could be that it’s just luck in that some things are a little slower on
> ActiveMq Classic letting us process all our data successfully in an overall
> faster time because we don’t hit the same bottlenecks or throttling.
>
> Sorry, long email without any real information. I did try sending logs and
> more specific details earlier though, especially to redhat.
>
> Overall thanks for asking and I massively appreciate the work you guys all
> do on the software and here on the help thread
>
> On Wed, 19 Feb 2025 at 20:20, Justin Bertram <[email protected]> wrote:
>
> > I saw your email on the "Are there any hardware recommendations for
> > ActiveMQ Classic?" about performance issue you observed with Artemis. Is
> > this the issue you were referring to? If so, did you try the
> configuration
> > change I mentioned? Any follow-up you could provide here would be
> valuable.
> >
> >
> > Justin
> >
> > On Fri, May 10, 2024 at 6:51 AM Christian Kurmann <[email protected]> wrote:
> >
> > > Hi all,
> > > Would someone be able to guide me to documentation for options to
> handle
> > a
> > > problem I've encountered testing Artemis > 2.30.
> > >
> > > I have a set of ~50 java microservices processing 1.5million messages
> > which
> > > communicate among themselves via Artemis using Openwire.
> > > To test I have a dataset containing lots of small and a realistic
> amount
> > of
> > > large messages (> 10KB) which simulates a full prod day but can be
> > > processed in under 1h.
> > >
> > > Using Artemis 2.30 I can get my data processed, however with later
> > versions
> > > the system hangs due to a large backlog on an important Queue which is
> > used
> > > by all services to write trace data.
> > > All microservices process data in batches of max 100msgs while reading
> > and
> > > writing to the queues.
> > >
> > > The main issue is that when this happens, both the reading and writing
> > > clients of this queue simply hang waiting for Artemis to return ACK
> which
> > > never comes.
> > > During this time Artemis does not log anything suspicious. However > 10
> > > minutes later I do see paging starting and I get the warning:
> > >
> > > 2024-05-10 09:26:12,605 INFO [io.hawt.web.auth.LoginServlet] Hawtio
> login
> > > is using 1800 sec. HttpSession timeout
> > > 2024-05-10 09:26:12,614 INFO [io.hawt.web.auth.LoginServlet] Logging in
> > > user: webadmin
> > > 2024-05-10 09:26:12,906 INFO
> [io.hawt.web.auth.keycloak.KeycloakServlet]
> > > Keycloak integration is disabled
> > > 2024-05-10 09:26:12,955 INFO [io.hawt.web.proxy.ProxyServlet] Proxy
> > servlet
> > > is disabled
> > > 2024-05-10 09:48:56,955 INFO [org.apache.activemq.artemis.core.server]
> > > AMQ222038: Starting paging on address 'IMS.PRINTS.V2'; size=10280995859
> > > bytes (1820178 messages); maxSize=-1 bytes (-1 messages);
> > > globalSize=10309600627 bytes (1825169 messages);
> > globalMaxSize=10309599232
> > > bytes (-1 messages);
> > > 2024-05-10 09:48:56,962 WARN
> > > [org.apache.activemq.artemis.core.server.Queue]
> > > AMQ224127: Message dispatch from paging is blocked. Address
> > > IMS.PRINTS.V2/Queue IMS.PRINTS.V2 will not read any more messages from
> > > paging until pending messages are acknowledged. There are currently
> 14500
> > > messages pending (51829364 bytes) with max reads at
> > maxPageReadMessages(-1)
> > > and maxPageReadBytes(20971520). Either increase reading attributes at
> the
> > > address-settings or change your consumers to acknowledge more often.
> > > 2024-05-10 09:49:24,458 INFO [org.apache.activemq.artemis.core.server]
> > > AMQ222038: Starting paging on address 'PRINTS'; size=28608213 bytes
> (4992
> > > messages); maxSize=-1 bytes (-1 messages); globalSize=10309604144 bytes
> > > (1825170 messages); globalMaxSize=10309599232 bytes (-1 messages);
> > >
> > > I see in the commit that added this warning that the maxPageReadBytes
> is
> > > not something I can actually change so I assume any solution needs to
> > > happen earlier in the chain
> > > https://www.mail-archive.com/[email protected]/msg61667.html
> > >
> > > But I can't find anything sensible to configure to help me here.
> > > I also am concerned about this suddenly appearing with a new Artemis
> > > version and I can reproduce the crash with my dataset as well as verify
> > > that with 2.30 it still works fine.
> > > The fact that there is nothing in the logs also worries me.
> > > Reconnecting the clients allows messages to be processed however the
> > system
> > > crashes again very quickly due to all the pending transactions starting
> > > again and leading to the same issue.
> > >
> > > Running on Openshift using AMQ Cloud operator.
> > > Clients connect via Openwire and using activemq-client-5.17.2.jar and
> > > java17
> > >
> > > All input is appreciated and I'm happy to run as many tests as
> required.
> > >
> > > Regards Chris
> > >
> >
>

Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Reply via email to