Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Justin Bertram Wed, 19 Feb 2025 12:06:15 -0800

Could you provide some form of the test-case you were running so I could
investigate further?



Justin

On Wed, Feb 19, 2025 at 1:56 PM Christian Kurmann <[email protected]> wrote:

> No, sorry to be unclear on this.
> We never got long term high performance runs with Artemis.
> We tried for about 1 year.
> It’s fine when we only have low traffic but even then seemed to grow
> sluggish requiring us to start with weekly restarts.
> It was simply unstable as soon as we got a large amount of data to process
> and we didn’t manage to tune it or our software for our usecase.
> ActiveMQ Classic works fine
>
> On Wed, 19 Feb 2025 at 20:44, Justin Bertram <[email protected]> wrote:
>
> > To be clear, everything worked in Artemis 2.30.0 and then began failing
> > after an upgrade?
> >
> >
> > Justin
> >
> > On Wed, Feb 19, 2025 at 1:40 PM Christian Kurmann <[email protected]> wrote:
> >
> > > Hi Justin.
> > > Yes, and I’m sorry it didn't work out for us.
> > > The problem was a combination of using operators on openshift which
> among
> > > other things make it difficult to change settings due to the internal
> > > translation of attributes to actual config values in the running pods,
> > and
> > > the issues I alluded to earlier where by we got several variations of
> > > bottlenecks that interacted to soft lock the system.
> > > Very often the disks run full but this is preceded by a sudden drop in
> > > performance. I believe our microservice architecture where processes
> read
> > > from a queue and then write to another queue on the same activeMQ is a
> > bit
> > > of an anti pattern for the assumptions Artemis has for its throttling
> > >
> > > We had a consistent issue with dead connections that didn’t show up on
> > the
> > > Artemis GUI and only became noticeable in the logs (and in failed
> > > connection attempts by clients) but we never made any headway into
> > finding
> > > out why they exist or how to close them other than by restarting and
> > > «solved» the issue simply by removing limits. Not sure if this was a
> > cause
> > > or a symptom or just another open issue.
> > >
> > > We tried several versions of Artemis and drivers (both out original
> open
> > > wire and a rewritten setup with Core protocol) and paid redhat support
> > > without being able to get rid of the crashes. We did see different
> > patterns
> > > in how fast and how bad it crashed with Openwire vs core. Core was
> > > definitely faster and held out longer but the. Crashed in a worse way.
> > > Again i assume just an artifact of our data load being too much for
> > Artemis
> > > to handle gracefully.
> > >
> > > Switching back to ActiveMq classic resolved the issues for us in so far
> > as
> > > we managed to pass all of our performance tests and its been running in
> > > several production sites for months now without the previous issues.
> > > It could be that it’s just luck in that some things are a little slower
> > on
> > > ActiveMq Classic letting us process all our data successfully in an
> > overall
> > > faster time because we don’t hit the same bottlenecks or throttling.
> > >
> > > Sorry, long email without any real information. I did try sending logs
> > and
> > > more specific details earlier though, especially to redhat.
> > >
> > > Overall thanks for asking and I massively appreciate the work you guys
> > all
> > > do on the software and here on the help thread
> > >
> > > On Wed, 19 Feb 2025 at 20:20, Justin Bertram <[email protected]>
> > wrote:
> > >
> > > > I saw your email on the "Are there any hardware recommendations for
> > > > ActiveMQ Classic?" about performance issue you observed with Artemis.
> > Is
> > > > this the issue you were referring to? If so, did you try the
> > > configuration
> > > > change I mentioned? Any follow-up you could provide here would be
> > > valuable.
> > > >
> > > >
> > > > Justin
> > > >
> > > > On Fri, May 10, 2024 at 6:51 AM Christian Kurmann <[email protected]>
> > wrote:
> > > >
> > > > > Hi all,
> > > > > Would someone be able to guide me to documentation for options to
> > > handle
> > > > a
> > > > > problem I've encountered testing Artemis > 2.30.
> > > > >
> > > > > I have a set of ~50 java microservices processing 1.5million
> messages
> > > > which
> > > > > communicate among themselves via Artemis using Openwire.
> > > > > To test I have a dataset containing lots of small and a realistic
> > > amount
> > > > of
> > > > > large messages (> 10KB) which simulates a full prod day but can be
> > > > > processed in under 1h.
> > > > >
> > > > > Using Artemis 2.30 I can get my data processed, however with later
> > > > versions
> > > > > the system hangs due to a large backlog on an important Queue which
> > is
> > > > used
> > > > > by all services to write trace data.
> > > > > All microservices process data in batches of max 100msgs while
> > reading
> > > > and
> > > > > writing to the queues.
> > > > >
> > > > > The main issue is that when this happens, both the reading and
> > writing
> > > > > clients of this queue simply hang waiting for Artemis to return ACK
> > > which
> > > > > never comes.
> > > > > During this time Artemis does not log anything suspicious. However
> >
> > 10
> > > > > minutes later I do see paging starting and I get the warning:
> > > > >
> > > > > 2024-05-10 09:26:12,605 INFO [io.hawt.web.auth.LoginServlet] Hawtio
> > > login
> > > > > is using 1800 sec. HttpSession timeout
> > > > > 2024-05-10 09:26:12,614 INFO [io.hawt.web.auth.LoginServlet]
> Logging
> > in
> > > > > user: webadmin
> > > > > 2024-05-10 09:26:12,906 INFO
> > > [io.hawt.web.auth.keycloak.KeycloakServlet]
> > > > > Keycloak integration is disabled
> > > > > 2024-05-10 09:26:12,955 INFO [io.hawt.web.proxy.ProxyServlet] Proxy
> > > > servlet
> > > > > is disabled
> > > > > 2024-05-10 09:48:56,955 INFO
> > [org.apache.activemq.artemis.core.server]
> > > > > AMQ222038: Starting paging on address 'IMS.PRINTS.V2';
> > size=10280995859
> > > > > bytes (1820178 messages); maxSize=-1 bytes (-1 messages);
> > > > > globalSize=10309600627 bytes (1825169 messages);
> > > > globalMaxSize=10309599232
> > > > > bytes (-1 messages);
> > > > > 2024-05-10 09:48:56,962 WARN
> > > > > [org.apache.activemq.artemis.core.server.Queue]
> > > > > AMQ224127: Message dispatch from paging is blocked. Address
> > > > > IMS.PRINTS.V2/Queue IMS.PRINTS.V2 will not read any more messages
> > from
> > > > > paging until pending messages are acknowledged. There are currently
> > > 14500
> > > > > messages pending (51829364 bytes) with max reads at
> > > > maxPageReadMessages(-1)
> > > > > and maxPageReadBytes(20971520). Either increase reading attributes
> at
> > > the
> > > > > address-settings or change your consumers to acknowledge more
> often.
> > > > > 2024-05-10 09:49:24,458 INFO
> > [org.apache.activemq.artemis.core.server]
> > > > > AMQ222038: Starting paging on address 'PRINTS'; size=28608213 bytes
> > > (4992
> > > > > messages); maxSize=-1 bytes (-1 messages); globalSize=10309604144
> > bytes
> > > > > (1825170 messages); globalMaxSize=10309599232 bytes (-1 messages);
> > > > >
> > > > > I see in the commit that added this warning that the
> maxPageReadBytes
> > > is
> > > > > not something I can actually change so I assume any solution needs
> to
> > > > > happen earlier in the chain
> > > > >
> > https://www.mail-archive.com/[email protected]/msg61667.html
> > > > >
> > > > > But I can't find anything sensible to configure to help me here.
> > > > > I also am concerned about this suddenly appearing with a new
> Artemis
> > > > > version and I can reproduce the crash with my dataset as well as
> > verify
> > > > > that with 2.30 it still works fine.
> > > > > The fact that there is nothing in the logs also worries me.
> > > > > Reconnecting the clients allows messages to be processed however
> the
> > > > system
> > > > > crashes again very quickly due to all the pending transactions
> > starting
> > > > > again and leading to the same issue.
> > > > >
> > > > > Running on Openshift using AMQ Cloud operator.
> > > > > Clients connect via Openwire and using activemq-client-5.17.2.jar
> and
> > > > > java17
> > > > >
> > > > > All input is appreciated and I'm happy to run as many tests as
> > > required.
> > > > >
> > > > > Regards Chris
> > > > >
> > > >
> > >
> >
>

Re: AMQ clients hang waiting for ACK after large messages hits maxPageReadBytes

Reply via email to