An update on this: I have replicated the memory and expiration issues on current 2.5.0-SNAPSHOT with included client libraries and a one node broker by modifying an existing artemis example. As messages are routed to DLQ, paged and expired, memory consumption keeps increasing and eventually leads to heap space exhaustion rendering the broker unable to route messages. What should happen is that the memory consumption should stay reasonable even without expiration due to paging to disk but doubly so because having expired, the messages shouldn't consume any resources.
I'm not certain if the two issues (erroneous statistics on expiration and the memory leak) are connected but they both do appear at the same time raising suspicion. A possible cause could be that filtered message expiration behaves differently than some other means of expiration: it uses a private expiration method that takes a transaction as a parameter. Unlike the nontransacted expiration method, it checks for empty bindings separately but doesn't seem to decrement counters appropriately in this case. Even though I have set a null expiry-address (<expiry-address />) it is seen as nonnull in expiration. Then as the expiry address is not null but bindings are not found, the warning about dropping the message is logged. However, it seems that the message is never acknowledged and the deliveringCount is never decreased so delivery metrics end up being wrong. Shouldn't there be an acknowledgment of the message reference following the logging when the following condition is matched? https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/QueueImpl.java#L2735 Also, why is the acknowledgment reason here not expiry but normal? One would imagine it should be acknowledge(tx, ref, AckReason.EXPIRED) instead of the default overload so that the appropriate counters end up being incremented: https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/QueueImpl.java#L2747 Best regards, - Ilkka -----Original Message----- From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi] Sent: 27. helmikuuta 2018 15:20 To: users@activemq.apache.org Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message redistribution Hello, - I don't have consumers on the DLQ and neither are any listed in its JMX attributes - The messages are being sent to the DLQ by the broker after a delivery failure on another queue. The delivery failure is expected and caused by a transactional rollback on the consumer. - I am setting the expiry delay on the broker's DLQ address-settings (not in message attributes). I'm setting an empty expiry-address in the same place. - I have a set of broker settings and a small springboot application with which I was able to replicate the issue. Would you like me to provide it for you somehow? It seems like there's a some kind of hiccup in message expiration. When the messages routed to the DLQ start expiring, the broker logs: AMQ222146: Message has expired. No bindings for Expiry Address so dropping it but when reviewing the DLQ statistics via JMX the ExpiredMessages counter is not incremented, but the DeliveringCount is. As messages keep expiring the deliverincount keeps increasing. This feels a lot like the issue I've been having. Could it be that this process leaks memory/resources or is it just that the expiration statistics always assume that expiration results in redelivery thereby causing erroneous numbers to be reported? Best regards, - Ilkka -----Original Message----- From: Justin Bertram [mailto:jbert...@apache.org] Sent: 23. helmikuuta 2018 16:51 To: users@activemq.apache.org Subject: Re: Artemis 2.4.0 - Issues with memory leaks and JMS message redistribution Couple of questions: - Do you have any consumers on the DLQ? - Are messages being sent to the DLQ by the broker automatically (e.g. based on delivery attempt failures) or is that being done by your application? - How are you setting the expiry delay? - Do you have a reproducible test-case? Justin On Fri, Feb 23, 2018 at 4:38 AM, Ilkka Virolainen < ilkka.virolai...@bitwise.fi> wrote: > I'm still facing an issue with somewhat confusing behavior regarding > message expiration in the DLQ, maybe related to the memory issues I've > been having. My aim is to have messages routed to DLQ expire and > dropped in one hour. To achieve this, I've set an empty expiry-address > and the appropriate expiry-delay. The problem is, most of the messages > routed to DLQ end up in an in-delivery state - they are not expiring > and I cannot remove them via JMX. Messagecount in the DLQ is slightly > higher than the deliveringcount and attempting to remove all messages > only removes a number of messages that is equal to the difference > between deliveringcount and messagecount which is approximately a few > thousand messages while the messagecount is tens of thousands and increasing > as message delivery failures occur. > > What could be the reason for this behavior and how could it be avoided? > > -----Original Message----- > From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi] > Sent: 22. helmikuuta 2018 13:38 > To: users@activemq.apache.org > Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message > redistribution > > To answer my own question in case anyone else is wondering about a > similar issue, turns out the change in addressing is referred in > ticket [1] and adding the multicastPrefix and anycastPrefix described > in the ticket to my broker acceptors seems to have fixed my problem. > If the issue regarding memory leaks persists I will try to provide a > reproducible test case. > > Thank you for your help, Justin. > > Best regards, > - Ilkka > > [1] https://issues.apache.org/jira/browse/ARTEMIS-1644 > > > -----Original Message----- > From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi] > Sent: 22. helmikuuta 2018 12:33 > To: users@activemq.apache.org > Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message > redistribution > > Having removed the address configuration and having switched from > 2.4.0 to yesterday's snapshot of 2.5.0 it seems like the > redistribution of messages is now working, but there also seems to > have been a change in addressing between the versions causing another > problem related to jms.queue / jms.topic prefixing. While the NMS > clients listen and artemis jms clients send to the same topics as > described in the previous message, Artemis 2.5.0 prefixes the > addresses with jms.topic. While the messages are being sent to e.g. > A.B.f64dd592-a8fb-442e-826d-927834d566f4.C.D they are only received if > I explicitly prefix the listening address with jms.topic, for example > topic://jms.topic.A.B.*.C.D. Can this somehow be avoided in the broker > configuration? > > Best regards > > -----Original Message----- > From: Justin Bertram [mailto:jbert...@apache.org] > Sent: 21. helmikuuta 2018 15:19 > To: users@activemq.apache.org > Subject: Re: Artemis 2.4.0 - Issues with memory leaks and JMS message > redistribution > > Your first issue is probably a misconfiguration. Your > cluster-connection is using an "address" value of '*' which I assume > is supposed to mean "all addresses," but the "address" element doesn't > support wildcards like this. > Just leave it empty to match all addresses. See the documentation [1] > for more details. > > Even after you fix that configuration issue you may run into issues. > These may be fixed already via ARTEMIS-1523 and/or ARTEMIS-1680. If > you have a reproducible test-case then you can verify using the head > of the master branch. > > For the memory issue it would be helpful to have some heap dumps or > something to actually see what's actually consuming the memory. > Better yet would be a reproducible test-case. Do you have either? > > > Justin > > [1] https://activemq.apache.org/artemis/docs/latest/clusters.html > > > > On Wed, Feb 21, 2018 at 5:39 AM, Ilkka Virolainen < > ilkka.virolai...@bitwise.fi> wrote: > > > Hello, > > > > I am using Artemis 2.4.0 to broker messages through JMS > > queues/topics between a set of clients. Some are Apache NMS 1.7.2 > > ActiveMQ clients and others are using Artemis JMS client 1.5.4 > > included in Spring Boot > 1.5.3. > > Broker topology is a symmetric cluster of two live nodes with static > > connectors, both nodes having been setup as replicating colocated > > backup pairs with scale down. I have two quite frustrating issues at > > the > moment: > > message redistribution not working correctly and a memory leak > > causing eventual thread death. > > > > ISSUE #1 - Message redistribution / load balancing not working: > > > > Client 1 (NMS) connects to broker a and starts listening, artemis > > creates the following address: > > > > (Broker a): > > A.B.*.C.D > > |-queues > > |-multicast > > |-f64dd592-a8fb-442e-826d-927834d566f4 > > > > Server 1 (artemis-jms-client) connects to broker b and sends a > > message to > > topic: A.B.f64dd592-a8fb-442e-826d-927834d566f4.C.D - this should be > > routed to broker a since the corresponding queue has no consumers on > > broker b (the queue does not exist). This however does not happen > > and the client receives no messages. Broker b has some other clients > > connected, causing similar (but not the same) queues having been created: > > > > (Broker b): > > A.B.*.C.D > > |-queues > > |-multicast > > |-1eb48079-7fd8-40e9-b822-bcc25695ced0 > > |-9f295257-c352-4ae6-b74b-d5994f330485 > > > > > > ISSUE #2: - Memory leak and eventual thread death > > > > Artemis broker has 4GB allocated heap space and global-max-size is > > set up as half of that (being the default setting). > > Address-full-policy is set to PAGE for all addresses and some > > individual addresses have small max-size-bytes values set e.g. > > 104857600. As far as I know the paging settings should limit memory > > usage but what happens is that at times Artemis uses the whole heap > > space, encounters an out of memory error and > > dies: > > > > 05:39:29,510 WARN [org.eclipse.jetty.util.thread.QueuedThreadPool] : > > java.lang.OutOfMemoryError: Java heap space > > 05:39:16,646 WARN [io.netty.channel.ChannelInitializer] Failed to > > initialize a channel. Closing: [id: ...]: java.lang.OutOfMemoryError: > > Java heap space > > 05:41:05,597 WARN [org.eclipse.jetty.util.thread.QueuedThreadPool] > > Unexpected thread death: org.eclipse.jetty.util.thread. > > QueuedThreadPool$2@5ffaba31 in > > qtp20111564{STARTED,8<=8<=200,i=2,q=0} > > > > Are these known issues in Artemis or misconfigurations in the brokers? > > > > The broker configurations are as follows. Broker b has an identical > > configuration excluding that the cluster connector's connector-ref > > and static-connector connector-ref refer to broker b and broker a > respectively. > > > > Best regards, > > > > broker.xml (broker a): > > > > <?xml version='1.0'?> > > <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/ > > 2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq > > /schema/artemis-configuration.xsd"> > > <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/ > > 2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core "> > > <name>[broker-a-ip]</name> > > <persistence-enabled>true</persistence-enabled> > > > > <journal-type>NIO</journal-type> > > > > <paging-directory>...</paging-directory> > > <bindings-directory>...</bindings-directory> > > <journal-directory>...</journal-directory> > > <large-messages-directory>...</large-messages-directory> > > > > <journal-datasync>true</journal-datasync> > > <journal-min-files>2</journal-min-files> > > <journal-pool-files>-1</journal-pool-files> > > <journal-buffer-timeout>788000</journal-buffer-timeout> > > <disk-scan-period>5000</disk-scan-period> > > > > <max-disk-usage>97</max-disk-usage> > > > > <critical-analyzer>true</critical-analyzer> > > <critical-analyzer-timeout>120000</critical-analyzer-timeout> > > <critical-analyzer-check-period>60000</critical- > > analyzer-check-period> > > <critical-analyzer-policy>HALT</critical-analyzer-policy> > > > > <acceptors> > > <acceptor name="invm-acceptor">vm://0</acceptor> > > <acceptor name="artemis">tcp://0.0.0.0:61616</acceptor> > > <acceptor > > name="ssl">tcp://0.0.0.0:61617?sslEnabled=true; > > keyStorePath=...;keyStorePassword=...</acceptor> > > </acceptors> > > <connectors> > > <connector name="invm-connector">vm://0</connector> > > <connector name="netty-connector">tcp://[ > > broker-a-ip]:61616</connector> > > <connector name="broker-b-connector">[ > > broker-b-ip]:61616</connector> > > </connectors> > > > > <cluster-connections> > > <cluster-connection name="cluster-name"> > > <address>*</address> > > <connector-ref>netty-connector</connector-ref> > > <retry-interval>500</retry-interval> > > <reconnect-attempts>5</reconnect-attempts> > > <use-duplicate-detection>true</use-duplicate-detection> > > <message-load-balancing>ON_DEMAND</message-load- > balancing> > > <max-hops>1</max-hops> > > <static-connectors> > > <connector-ref>broker-b-connector</connector-ref> > > </static-connectors> > > </cluster-connection> > > </cluster-connections> > > > > <ha-policy> > > <replication> > > <colocated> > > > > <backup-request-retry-interval>5000</backup-request- > > retry-interval> > > <max-backups>3</max-backups> > > <request-backup>true</request-backup> > > <backup-port-offset>100</backup-port-offset> > > <excludes> > > <connector-ref>invm-connector</connector-ref> > > <connector-ref>netty-connector</connector-ref> > > </excludes> > > <master> > > <check-for-live-server>true</ > > check-for-live-server> > > </master> > > <slave> > > <restart-backup>false</restart-backup> > > <scale-down /> > > </slave> > > </colocated> > > </replication> > > </ha-policy> > > > > <cluster-user>ARTEMIS.CLUSTER.ADMIN.USER</cluster-user> > > <cluster-password>[the shared cluster > > password]</cluster-password> > > > > <security-settings> > > <security-setting match="#"> > > <permission type="createDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteDurableQueue" roles="amq, > > other-role" /> > > <permission type="createNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="createAddress" roles="amq, other-role" > /> > > <permission type="deleteNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteAddress" roles="amq, other-role" > /> > > <permission type="consume" roles="amq, other-role" /> > > <permission type="browse" roles="amq, other-role" /> > > <permission type="send" roles="amq, other-role" /> > > <permission type="manage" roles="amq" /> > > </security-setting> > > <security-setting match="A.some.queue"> > > <permission type="createNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="createDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteDurableQueue" roles="amq, > > other-role" /> > > <permission type="createAddress" roles="amq, other-role" > /> > > <permission type="deleteAddress" roles="amq, other-role" > /> > > <permission type="consume" roles="amq, other-role" /> > > <permission type="browse" roles="amq, other-role" /> > > <permission type="send" roles="amq, other-role" /> > > </security-setting> > > <security-setting match="A.some.other.queue"> > > <permission type="createNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteNonDurableQueue" roles="amq, > > other-role" /> > > <permission type="createDurableQueue" roles="amq, > > other-role" /> > > <permission type="deleteDurableQueue" roles="amq, > > other-role" /> > > <permission type="createAddress" roles="amq, other-role" > /> > > <permission type="deleteAddress" roles="amq, other-role" > /> > > <permission type="consume" roles="amq, other-role" /> > > <permission type="browse" roles="amq, other-role" /> > > <permission type="send" roles="amq, other-role" /> > > </security-setting> > > ... > > ... etc. > > ... > > </security-settings> > > > > <address-settings> > > <address-setting match="activemq.management#"> > > <dead-letter-address>DLQ</dead-letter-address> > > <expiry-address>ExpiryQueue</expiry-address> > > <redelivery-delay>0</redelivery-delay> > > <max-size-bytes>-1</max-size-bytes> > > > > <message-counter-history-day-limit>10</message-counter- > > history-day-limit> > > <address-full-policy>PAGE</address-full-policy> > > </address-setting> > > <!--default for catch all --> > > <address-setting match="#"> > > <dead-letter-address>DLQ</dead-letter-address> > > <expiry-address>ExpiryQueue</expiry-address> > > <redelivery-delay>0</redelivery-delay> > > <max-size-bytes>-1</max-size-bytes> > > > > <message-counter-history-day-limit>10</message-counter- > > history-day-limit> > > <address-full-policy>PAGE</address-full-policy> > > <redistribution-delay>1000</redistribution-delay> > > </address-setting> > > <address-setting match="DLQ"> > > <!-- 100 * 1024 * 1024 -> 100MB --> > > <max-size-bytes>104857600</max-size-bytes> > > <!-- 1000 * 60 * 60 -> 1h --> > > <expiry-delay>3600000</expiry-delay> > > <expiry-address /> > > </address-setting> > > <address-setting match="A.some.queue"> > > <redelivery-delay-multiplier>1.0</redelivery-delay- > > multiplier> > > <redelivery-delay>0</redelivery-delay> > > <max-redelivery-delay>10</max-redelivery-delay> > > </address-setting> > > <address-setting match="A.some.other.queue"> > > <redelivery-delay-multiplier>1.0</redelivery-delay- > > multiplier> > > <redelivery-delay>0</redelivery-delay> > > <max-redelivery-delay>10</max-redelivery-delay> > > <max-delivery-attempts>1</max-delivery-attempts> > > <max-size-bytes>104857600</max-size-bytes> > > </address-setting> > > ... > > ... etc. > > ... > > </address-settings> > > > > <addresses> > > <address name="DLQ"> > > <anycast> > > <queue name="DLQ" /> > > </anycast> > > </address> > > <address name="ExpiryQueue"> > > <anycast> > > <queue name="ExpiryQueue" /> > > </anycast> > > </address> > > <address name="A.some.queue"> > > <anycast> > > <queue name="A.some.queue"> > > <durable>true</durable> > > </queue> > > </anycast> > > </address> > > <address name="A.some.other.queue"> > > <anycast> > > <queue name="A.some.other.queue"> > > <durable>true</durable> > > </queue> > > </anycast> > > </address> > > ... > > ... etc. > > ... > > </addresses> > > </core> > > </configuration> > > >