I will see what I can do. As mentioned I processed millions of real financial messages. This will take a while.
On Wed, 19 Feb 2025 at 20:59, Justin Bertram <jbert...@apache.org> wrote: > Could you provide some form of the test-case you were running so I could > investigate further? > > > Justin > > On Wed, Feb 19, 2025 at 1:56 PM Christian Kurmann <c...@tere.tech> wrote: > > > No, sorry to be unclear on this. > > We never got long term high performance runs with Artemis. > > We tried for about 1 year. > > It’s fine when we only have low traffic but even then seemed to grow > > sluggish requiring us to start with weekly restarts. > > It was simply unstable as soon as we got a large amount of data to > process > > and we didn’t manage to tune it or our software for our usecase. > > ActiveMQ Classic works fine > > > > On Wed, 19 Feb 2025 at 20:44, Justin Bertram <jbert...@apache.org> > wrote: > > > > > To be clear, everything worked in Artemis 2.30.0 and then began failing > > > after an upgrade? > > > > > > > > > Justin > > > > > > On Wed, Feb 19, 2025 at 1:40 PM Christian Kurmann <c...@tere.tech> > wrote: > > > > > > > Hi Justin. > > > > Yes, and I’m sorry it didn't work out for us. > > > > The problem was a combination of using operators on openshift which > > among > > > > other things make it difficult to change settings due to the internal > > > > translation of attributes to actual config values in the running > pods, > > > and > > > > the issues I alluded to earlier where by we got several variations of > > > > bottlenecks that interacted to soft lock the system. > > > > Very often the disks run full but this is preceded by a sudden drop > in > > > > performance. I believe our microservice architecture where processes > > read > > > > from a queue and then write to another queue on the same activeMQ is > a > > > bit > > > > of an anti pattern for the assumptions Artemis has for its throttling > > > > > > > > We had a consistent issue with dead connections that didn’t show up > on > > > the > > > > Artemis GUI and only became noticeable in the logs (and in failed > > > > connection attempts by clients) but we never made any headway into > > > finding > > > > out why they exist or how to close them other than by restarting and > > > > «solved» the issue simply by removing limits. Not sure if this was a > > > cause > > > > or a symptom or just another open issue. > > > > > > > > We tried several versions of Artemis and drivers (both out original > > open > > > > wire and a rewritten setup with Core protocol) and paid redhat > support > > > > without being able to get rid of the crashes. We did see different > > > patterns > > > > in how fast and how bad it crashed with Openwire vs core. Core was > > > > definitely faster and held out longer but the. Crashed in a worse > way. > > > > Again i assume just an artifact of our data load being too much for > > > Artemis > > > > to handle gracefully. > > > > > > > > Switching back to ActiveMq classic resolved the issues for us in so > far > > > as > > > > we managed to pass all of our performance tests and its been running > in > > > > several production sites for months now without the previous issues. > > > > It could be that it’s just luck in that some things are a little > slower > > > on > > > > ActiveMq Classic letting us process all our data successfully in an > > > overall > > > > faster time because we don’t hit the same bottlenecks or throttling. > > > > > > > > Sorry, long email without any real information. I did try sending > logs > > > and > > > > more specific details earlier though, especially to redhat. > > > > > > > > Overall thanks for asking and I massively appreciate the work you > guys > > > all > > > > do on the software and here on the help thread > > > > > > > > On Wed, 19 Feb 2025 at 20:20, Justin Bertram <jbert...@apache.org> > > > wrote: > > > > > > > > > I saw your email on the "Are there any hardware recommendations for > > > > > ActiveMQ Classic?" about performance issue you observed with > Artemis. > > > Is > > > > > this the issue you were referring to? If so, did you try the > > > > configuration > > > > > change I mentioned? Any follow-up you could provide here would be > > > > valuable. > > > > > > > > > > > > > > > Justin > > > > > > > > > > On Fri, May 10, 2024 at 6:51 AM Christian Kurmann <c...@tere.tech> > > > wrote: > > > > > > > > > > > Hi all, > > > > > > Would someone be able to guide me to documentation for options to > > > > handle > > > > > a > > > > > > problem I've encountered testing Artemis > 2.30. > > > > > > > > > > > > I have a set of ~50 java microservices processing 1.5million > > messages > > > > > which > > > > > > communicate among themselves via Artemis using Openwire. > > > > > > To test I have a dataset containing lots of small and a realistic > > > > amount > > > > > of > > > > > > large messages (> 10KB) which simulates a full prod day but can > be > > > > > > processed in under 1h. > > > > > > > > > > > > Using Artemis 2.30 I can get my data processed, however with > later > > > > > versions > > > > > > the system hangs due to a large backlog on an important Queue > which > > > is > > > > > used > > > > > > by all services to write trace data. > > > > > > All microservices process data in batches of max 100msgs while > > > reading > > > > > and > > > > > > writing to the queues. > > > > > > > > > > > > The main issue is that when this happens, both the reading and > > > writing > > > > > > clients of this queue simply hang waiting for Artemis to return > ACK > > > > which > > > > > > never comes. > > > > > > During this time Artemis does not log anything suspicious. > However > > > > > > 10 > > > > > > minutes later I do see paging starting and I get the warning: > > > > > > > > > > > > 2024-05-10 09:26:12,605 INFO [io.hawt.web.auth.LoginServlet] > Hawtio > > > > login > > > > > > is using 1800 sec. HttpSession timeout > > > > > > 2024-05-10 09:26:12,614 INFO [io.hawt.web.auth.LoginServlet] > > Logging > > > in > > > > > > user: webadmin > > > > > > 2024-05-10 09:26:12,906 INFO > > > > [io.hawt.web.auth.keycloak.KeycloakServlet] > > > > > > Keycloak integration is disabled > > > > > > 2024-05-10 09:26:12,955 INFO [io.hawt.web.proxy.ProxyServlet] > Proxy > > > > > servlet > > > > > > is disabled > > > > > > 2024-05-10 09:48:56,955 INFO > > > [org.apache.activemq.artemis.core.server] > > > > > > AMQ222038: Starting paging on address 'IMS.PRINTS.V2'; > > > size=10280995859 > > > > > > bytes (1820178 messages); maxSize=-1 bytes (-1 messages); > > > > > > globalSize=10309600627 bytes (1825169 messages); > > > > > globalMaxSize=10309599232 > > > > > > bytes (-1 messages); > > > > > > 2024-05-10 09:48:56,962 WARN > > > > > > [org.apache.activemq.artemis.core.server.Queue] > > > > > > AMQ224127: Message dispatch from paging is blocked. Address > > > > > > IMS.PRINTS.V2/Queue IMS.PRINTS.V2 will not read any more messages > > > from > > > > > > paging until pending messages are acknowledged. There are > currently > > > > 14500 > > > > > > messages pending (51829364 bytes) with max reads at > > > > > maxPageReadMessages(-1) > > > > > > and maxPageReadBytes(20971520). Either increase reading > attributes > > at > > > > the > > > > > > address-settings or change your consumers to acknowledge more > > often. > > > > > > 2024-05-10 09:49:24,458 INFO > > > [org.apache.activemq.artemis.core.server] > > > > > > AMQ222038: Starting paging on address 'PRINTS'; size=28608213 > bytes > > > > (4992 > > > > > > messages); maxSize=-1 bytes (-1 messages); globalSize=10309604144 > > > bytes > > > > > > (1825170 messages); globalMaxSize=10309599232 bytes (-1 > messages); > > > > > > > > > > > > I see in the commit that added this warning that the > > maxPageReadBytes > > > > is > > > > > > not something I can actually change so I assume any solution > needs > > to > > > > > > happen earlier in the chain > > > > > > > > > https://www.mail-archive.com/commits@activemq.apache.org/msg61667.html > > > > > > > > > > > > But I can't find anything sensible to configure to help me here. > > > > > > I also am concerned about this suddenly appearing with a new > > Artemis > > > > > > version and I can reproduce the crash with my dataset as well as > > > verify > > > > > > that with 2.30 it still works fine. > > > > > > The fact that there is nothing in the logs also worries me. > > > > > > Reconnecting the clients allows messages to be processed however > > the > > > > > system > > > > > > crashes again very quickly due to all the pending transactions > > > starting > > > > > > again and leading to the same issue. > > > > > > > > > > > > Running on Openshift using AMQ Cloud operator. > > > > > > Clients connect via Openwire and using activemq-client-5.17.2.jar > > and > > > > > > java17 > > > > > > > > > > > > All input is appreciated and I'm happy to run as many tests as > > > > required. > > > > > > > > > > > > Regards Chris > > > > > > > > > > > > > > > > > > > > >