RE: Artemis 2.4.0 - Issues with memory leaks and JMS message redistribution

Ilkka Virolainen Thu, 01 Mar 2018 06:59:50 -0800

An update on this: I have replicated the memory and expiration issues on 
current 2.5.0-SNAPSHOT with included client libraries and a one node broker by 
modifying an existing artemis example. As messages are routed to DLQ, paged and 
expired, memory consumption keeps increasing and eventually leads to heap space 
exhaustion rendering the broker unable to route messages. What should happen is 
that the memory consumption should stay reasonable even without expiration due 
to paging to disk but doubly so because having expired, the messages shouldn't 
consume any resources.


I'm not certain if the two issues (erroneous statistics on expiration and the 
memory leak) are connected but they both do appear at the same time raising 
suspicion. A possible cause could be that filtered message expiration behaves 
differently than some other means of expiration: it uses a private expiration 
method that takes a transaction as a parameter. Unlike the nontransacted 
expiration method, it checks for empty bindings separately but doesn't seem to 
decrement counters appropriately in this case. Even though I have set a null 
expiry-address (<expiry-address />) it is seen as nonnull in expiration. Then 
as the expiry address is not null but bindings are not found, the warning about 
dropping the message is logged. However, it seems that the message is never 
acknowledged and the deliveringCount is never decreased so delivery metrics end 
up being wrong.

Shouldn't there be an acknowledgment of the message reference following the 
logging when the following condition is matched?
https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/QueueImpl.java#L2735

Also, why is the acknowledgment reason here not expiry but normal? One would 
imagine it should be acknowledge(tx, ref, AckReason.EXPIRED) instead of the 
default overload so that the appropriate counters end up being incremented:
https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/QueueImpl.java#L2747

Best regards,
- Ilkka

-----Original Message-----
From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi] 
Sent: 27. helmikuuta 2018 15:20
To: users@activemq.apache.org
Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message 
redistribution

Hello,

- I don't have consumers on the DLQ and neither are any listed in its JMX 
attributes
- The messages are being sent to the DLQ by the broker after a delivery failure 
on another queue. The delivery failure is expected and caused by a 
transactional rollback on the consumer. 
- I am setting the expiry delay on the broker's DLQ address-settings (not in 
message attributes). I'm setting an empty expiry-address in the same place.
- I have a set of broker settings and a small springboot application with which 
I was able to replicate the issue. Would you like me to provide it for you 
somehow?

It seems like there's a some kind of hiccup in message expiration. When the 
messages routed to the DLQ start expiring, the broker logs:

AMQ222146: Message has expired. No bindings for Expiry Address  so dropping it

but when reviewing the DLQ statistics via JMX the ExpiredMessages counter is 
not incremented, but the DeliveringCount is. As messages keep expiring the 
deliverincount keeps increasing. This feels a lot like the issue I've been 
having. Could it be that this process leaks memory/resources or is it just that 
the expiration statistics always assume that expiration results in redelivery 
thereby causing erroneous numbers to be reported? 

Best regards,
- Ilkka


-----Original Message-----
From: Justin Bertram [mailto:jbert...@apache.org]
Sent: 23. helmikuuta 2018 16:51
To: users@activemq.apache.org
Subject: Re: Artemis 2.4.0 - Issues with memory leaks and JMS message 
redistribution

Couple of questions:

 - Do you have any consumers on the DLQ?
 - Are messages being sent to the DLQ by the broker automatically (e.g.
based on delivery attempt failures) or is that being done by your application?
 - How are you setting the expiry delay?
 - Do you have a reproducible test-case?


Justin

On Fri, Feb 23, 2018 at 4:38 AM, Ilkka Virolainen < 
ilkka.virolai...@bitwise.fi> wrote:

> I'm still facing an issue with somewhat confusing behavior regarding 
> message expiration in the DLQ, maybe related to the memory issues I've 
> been having. My aim is to have messages routed to DLQ expire and 
> dropped in one hour. To achieve this, I've set an empty expiry-address 
> and the appropriate expiry-delay. The problem is, most of the messages 
> routed to DLQ end up in an in-delivery state - they are not expiring 
> and I cannot remove them via JMX. Messagecount in the DLQ is slightly 
> higher than the deliveringcount and attempting to remove all messages 
> only removes a number of messages that is equal to the difference 
> between deliveringcount and messagecount which is approximately a few 
> thousand messages while the messagecount is tens of thousands and increasing 
> as message delivery failures occur.
>
> What could be the reason for this behavior and how could it be avoided?
>
> -----Original Message-----
> From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi]
> Sent: 22. helmikuuta 2018 13:38
> To: users@activemq.apache.org
> Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message 
> redistribution
>
> To answer my own question in case anyone else is wondering about a 
> similar issue, turns out the change in addressing is referred in 
> ticket [1] and adding the multicastPrefix and anycastPrefix described 
> in the ticket to my broker acceptors seems to have fixed my problem.
> If the issue regarding memory leaks persists I will try to provide a 
> reproducible test case.
>
> Thank you for your help, Justin.
>
> Best regards,
> - Ilkka
>
> [1] https://issues.apache.org/jira/browse/ARTEMIS-1644
>
>
> -----Original Message-----
> From: Ilkka Virolainen [mailto:ilkka.virolai...@bitwise.fi]
> Sent: 22. helmikuuta 2018 12:33
> To: users@activemq.apache.org
> Subject: RE: Artemis 2.4.0 - Issues with memory leaks and JMS message 
> redistribution
>
> Having removed the address configuration and having switched from
> 2.4.0 to yesterday's snapshot of 2.5.0 it seems like the 
> redistribution of messages is now working, but there also seems to 
> have been a change in addressing between the versions causing another 
> problem related to jms.queue / jms.topic prefixing. While the NMS 
> clients listen and artemis jms clients send to the same topics as 
> described in the previous message, Artemis 2.5.0 prefixes the 
> addresses with jms.topic. While the messages are being sent to e.g.
> A.B.f64dd592-a8fb-442e-826d-927834d566f4.C.D they are only received if 
> I explicitly prefix the listening address with jms.topic, for example 
> topic://jms.topic.A.B.*.C.D. Can this somehow be avoided in the broker 
> configuration?
>
> Best regards
>
> -----Original Message-----
> From: Justin Bertram [mailto:jbert...@apache.org]
> Sent: 21. helmikuuta 2018 15:19
> To: users@activemq.apache.org
> Subject: Re: Artemis 2.4.0 - Issues with memory leaks and JMS message 
> redistribution
>
> Your first issue is probably a misconfiguration.  Your 
> cluster-connection is using an "address" value of '*' which I assume 
> is supposed to mean "all addresses," but the "address" element doesn't 
> support wildcards like this.
> Just leave it empty to match all addresses.  See the documentation [1] 
> for more details.
>
> Even after you fix that configuration issue you may run into issues.
> These may be fixed already via ARTEMIS-1523 and/or ARTEMIS-1680.  If 
> you have a reproducible test-case then you can verify using the head 
> of the master branch.
>
> For the memory issue it would be helpful to have some heap dumps or 
> something to actually see what's actually consuming the memory.
> Better yet would be a reproducible test-case.  Do you have either?
>
>
> Justin
>
> [1] https://activemq.apache.org/artemis/docs/latest/clusters.html
>
>
>
> On Wed, Feb 21, 2018 at 5:39 AM, Ilkka Virolainen < 
> ilkka.virolai...@bitwise.fi> wrote:
>
> > Hello,
> >
> > I am using Artemis 2.4.0 to broker messages through JMS 
> > queues/topics between a set of clients. Some are Apache NMS 1.7.2 
> > ActiveMQ clients and others are using Artemis JMS client 1.5.4 
> > included in Spring Boot
> 1.5.3.
> > Broker topology is a symmetric cluster of two live nodes with static 
> > connectors, both nodes having been setup as replicating colocated 
> > backup pairs with scale down. I have two quite frustrating issues at 
> > the
> moment:
> > message redistribution not working correctly and a memory leak 
> > causing eventual thread death.
> >
> > ISSUE #1 - Message redistribution / load balancing not working:
> >
> > Client 1 (NMS) connects to broker a and starts listening, artemis 
> > creates the following address:
> >
> > (Broker a):
> > A.B.*.C.D
> > |-queues
> > |-multicast
> >   |-f64dd592-a8fb-442e-826d-927834d566f4
> >
> > Server 1 (artemis-jms-client) connects to broker b and sends a 
> > message to
> > topic: A.B.f64dd592-a8fb-442e-826d-927834d566f4.C.D - this should be 
> > routed to broker a since the corresponding queue has no consumers on 
> > broker b (the queue does not exist). This however does not happen 
> > and the client receives no messages. Broker b has some other clients 
> > connected, causing similar (but not the same) queues having been created:
> >
> > (Broker b):
> > A.B.*.C.D
> > |-queues
> > |-multicast
> >   |-1eb48079-7fd8-40e9-b822-bcc25695ced0
> >   |-9f295257-c352-4ae6-b74b-d5994f330485
> >
> >
> > ISSUE #2: - Memory leak and eventual thread death
> >
> > Artemis broker has 4GB allocated heap space and global-max-size is 
> > set up as half of that (being the default setting). 
> > Address-full-policy is set to PAGE for all addresses and some 
> > individual addresses have small max-size-bytes values set e.g. 
> > 104857600. As far as I know the paging settings should limit memory 
> > usage but what happens is that at times Artemis uses the whole heap 
> > space, encounters an out of memory error and
> > dies:
> >
> > 05:39:29,510 WARN  [org.eclipse.jetty.util.thread.QueuedThreadPool] :
> > java.lang.OutOfMemoryError: Java heap space
> > 05:39:16,646 WARN  [io.netty.channel.ChannelInitializer] Failed to 
> > initialize a channel. Closing: [id: ...]: java.lang.OutOfMemoryError:
> > Java heap space
> > 05:41:05,597 WARN  [org.eclipse.jetty.util.thread.QueuedThreadPool]
> > Unexpected thread death: org.eclipse.jetty.util.thread.
> > QueuedThreadPool$2@5ffaba31 in 
> > qtp20111564{STARTED,8<=8<=200,i=2,q=0}
> >
> > Are these known issues in Artemis or misconfigurations in the brokers?
> >
> > The broker configurations are as follows. Broker b has an identical 
> > configuration excluding that the cluster connector's connector-ref 
> > and static-connector connector-ref refer to broker b and broker a
> respectively.
> >
> > Best regards,
> >
> > broker.xml (broker a):
> >
> > <?xml version='1.0'?>
> > <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/ 
> > 2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq 
> > /schema/artemis-configuration.xsd">
> >     <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/ 
> > 2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core ">
> >         <name>[broker-a-ip]</name>
> >         <persistence-enabled>true</persistence-enabled>
> >
> >         <journal-type>NIO</journal-type>
> >
> >         <paging-directory>...</paging-directory>
> >         <bindings-directory>...</bindings-directory>
> >         <journal-directory>...</journal-directory>
> >         <large-messages-directory>...</large-messages-directory>
> >
> >         <journal-datasync>true</journal-datasync>
> >         <journal-min-files>2</journal-min-files>
> >         <journal-pool-files>-1</journal-pool-files>
> >         <journal-buffer-timeout>788000</journal-buffer-timeout>
> >         <disk-scan-period>5000</disk-scan-period>
> >
> >         <max-disk-usage>97</max-disk-usage>
> >
> >         <critical-analyzer>true</critical-analyzer>
> >         <critical-analyzer-timeout>120000</critical-analyzer-timeout>
> >         <critical-analyzer-check-period>60000</critical-
> > analyzer-check-period>
> >         <critical-analyzer-policy>HALT</critical-analyzer-policy>
> >
> >         <acceptors>
> >             <acceptor name="invm-acceptor">vm://0</acceptor>
> >             <acceptor name="artemis">tcp://0.0.0.0:61616</acceptor>
> >             <acceptor 
> > name="ssl">tcp://0.0.0.0:61617?sslEnabled=true;
> > keyStorePath=...;keyStorePassword=...</acceptor>
> >         </acceptors>
> >         <connectors>
> >             <connector name="invm-connector">vm://0</connector>
> >             <connector name="netty-connector">tcp://[ 
> > broker-a-ip]:61616</connector>
> >             <connector name="broker-b-connector">[ 
> > broker-b-ip]:61616</connector>
> >         </connectors>
> >
> >         <cluster-connections>
> >             <cluster-connection name="cluster-name">
> >                 <address>*</address>
> >                 <connector-ref>netty-connector</connector-ref>
> >                 <retry-interval>500</retry-interval>
> >                 <reconnect-attempts>5</reconnect-attempts>
> >                 <use-duplicate-detection>true</use-duplicate-detection>
> >                 <message-load-balancing>ON_DEMAND</message-load-
> balancing>
> >                 <max-hops>1</max-hops>
> >                 <static-connectors>
> >                     <connector-ref>broker-b-connector</connector-ref>
> >                 </static-connectors>
> >             </cluster-connection>
> >         </cluster-connections>
> >
> >         <ha-policy>
> >             <replication>
> >                 <colocated>
> >
> > <backup-request-retry-interval>5000</backup-request-
> > retry-interval>
> >                     <max-backups>3</max-backups>
> >                     <request-backup>true</request-backup>
> >                     <backup-port-offset>100</backup-port-offset>
> >                     <excludes>
> >                         <connector-ref>invm-connector</connector-ref>
> >                         <connector-ref>netty-connector</connector-ref>
> >                     </excludes>
> >                     <master>
> >                         <check-for-live-server>true</
> > check-for-live-server>
> >                     </master>
> >                     <slave>
> >                         <restart-backup>false</restart-backup>
> >                         <scale-down />
> >                     </slave>
> >                 </colocated>
> >             </replication>
> >         </ha-policy>
> >
> >         <cluster-user>ARTEMIS.CLUSTER.ADMIN.USER</cluster-user>
> >         <cluster-password>[the shared cluster 
> > password]</cluster-password>
> >
> >         <security-settings>
> >             <security-setting match="#">
> >                 <permission type="createDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="createNonDurableQueue" roles="amq, 
> > other-role"  />
> >                 <permission type="createAddress" roles="amq, other-role"
> />
> >                 <permission type="deleteNonDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteAddress" roles="amq, other-role"
> />
> >                 <permission type="consume" roles="amq, other-role" />
> >                 <permission type="browse" roles="amq, other-role" />
> >                 <permission type="send" roles="amq, other-role" />
> >                 <permission type="manage" roles="amq" />
> >             </security-setting>
> >             <security-setting match="A.some.queue">
> >                 <permission type="createNonDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteNonDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="createDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="createAddress" roles="amq, other-role"
> />
> >                 <permission type="deleteAddress" roles="amq, other-role"
> />
> >                 <permission type="consume" roles="amq, other-role" />
> >                 <permission type="browse" roles="amq, other-role" />
> >                 <permission type="send" roles="amq, other-role" />
> >             </security-setting>
> >                 <security-setting match="A.some.other.queue">
> >                 <permission type="createNonDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteNonDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="createDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="deleteDurableQueue" roles="amq, 
> > other-role" />
> >                 <permission type="createAddress" roles="amq, other-role"
> />
> >                 <permission type="deleteAddress" roles="amq, other-role"
> />
> >                 <permission type="consume" roles="amq, other-role" />
> >                 <permission type="browse" roles="amq, other-role" />
> >                 <permission type="send" roles="amq, other-role" />
> >             </security-setting>
> >             ...
> >             ... etc.
> >             ...
> >         </security-settings>
> >
> >         <address-settings>
> >             <address-setting match="activemq.management#">
> >                 <dead-letter-address>DLQ</dead-letter-address>
> >                 <expiry-address>ExpiryQueue</expiry-address>
> >                 <redelivery-delay>0</redelivery-delay>
> >                 <max-size-bytes>-1</max-size-bytes>
> >
> > <message-counter-history-day-limit>10</message-counter-
> > history-day-limit>
> >                 <address-full-policy>PAGE</address-full-policy>
> >             </address-setting>
> >             <!--default for catch all -->
> >             <address-setting match="#">
> >                 <dead-letter-address>DLQ</dead-letter-address>
> >                 <expiry-address>ExpiryQueue</expiry-address>
> >                 <redelivery-delay>0</redelivery-delay>
> >                 <max-size-bytes>-1</max-size-bytes>
> >
> > <message-counter-history-day-limit>10</message-counter-
> > history-day-limit>
> >                 <address-full-policy>PAGE</address-full-policy>
> >                 <redistribution-delay>1000</redistribution-delay>
> >             </address-setting>
> >             <address-setting match="DLQ">
> >                 <!-- 100 * 1024 * 1024 -> 100MB -->
> >                 <max-size-bytes>104857600</max-size-bytes>
> >                 <!-- 1000 * 60 * 60 -> 1h -->
> >                 <expiry-delay>3600000</expiry-delay>
> >                 <expiry-address />
> >             </address-setting>
> >             <address-setting match="A.some.queue">
> >                 <redelivery-delay-multiplier>1.0</redelivery-delay-
> > multiplier>
> >                 <redelivery-delay>0</redelivery-delay>
> >                 <max-redelivery-delay>10</max-redelivery-delay>
> >             </address-setting>
> >                 <address-setting match="A.some.other.queue">
> >                 <redelivery-delay-multiplier>1.0</redelivery-delay-
> > multiplier>
> >                 <redelivery-delay>0</redelivery-delay>
> >                 <max-redelivery-delay>10</max-redelivery-delay>
> >                 <max-delivery-attempts>1</max-delivery-attempts>
> >                 <max-size-bytes>104857600</max-size-bytes>
> >             </address-setting>
> >             ...
> >             ... etc.
> >             ...
> >         </address-settings>
> >
> >         <addresses>
> >             <address name="DLQ">
> >                 <anycast>
> >                     <queue name="DLQ" />
> >                 </anycast>
> >             </address>
> >             <address name="ExpiryQueue">
> >                 <anycast>
> >                     <queue name="ExpiryQueue" />
> >                 </anycast>
> >             </address>
> >             <address name="A.some.queue">
> >                 <anycast>
> >                     <queue name="A.some.queue">
> >                         <durable>true</durable>
> >                     </queue>
> >                 </anycast>
> >             </address>
> >             <address name="A.some.other.queue">
> >                 <anycast>
> >                     <queue name="A.some.other.queue">
> >                         <durable>true</durable>
> >                     </queue>
> >                 </anycast>
> >             </address>
> >             ...
> >             ... etc.
> >             ...
> >         </addresses>
> >     </core>
> > </configuration>
> >
>

RE: Artemis 2.4.0 - Issues with memory leaks and JMS message redistribution

Reply via email to