AW: Suspected slow replication

Arno Schuerhoff Thu, 25 Apr 2024 07:00:57 -0700

Hello all,

we have got somewhat closer to the root cause. Primary broker started 
increasing response time for sending messages again and was “fixed” by simple 
reload of configuration by “touch broker.xml”.


Replicating backup broker log contains lines like

2024-04-25 01:33:41,401 WARN  
[org.apache.activemq.artemis.core.replication.ReplicationEndpoint] AMQ149005: 
Message of 501793 bytes is bigger than the max record size of 501760 bytes. You 
should try to move large application properties to the message body.
org.apache.activemq.artemis.api.core.ActiveMQIOErrorException: AMQ149005: 
Message of 501793 bytes is bigger than the max record size of 501760 bytes. You 
should try to move large application properties to the message body.

Live primary broker log doesn’t contain such lines regarding “large application 
properties”.

We had 501793 and 501769 as sizes for application properties what made the 
persisting trouble.
This is 33 and 9 bytes more than the limit.

I believe if you are sending a message to live broker having application 
properties larger than 490K then it’s rejected by live broker and log is also 
written at live broker.
But if you manage to send a message having 490K minus tidy amount of bytes 
application properties:

1 - it passes live broker and gets persisted
2 - it gets by replication some administrative information added
3- now application properties are 490K + something and error occurs on 
replicating backup broker
4 - client doesn’t return from call until time out reached or never if poor 
configured

I have tested this a message having 526101 bytes application properties (24341 
bytes above limit) and second message having 501772 bytes application 
properties (12 bytes above limit like our backup log entries).
First message was rejected immediate, and no entry showed up in backup broker 
log. Live broker log contained AMQ149005 entries.
Second message took 30 seconds until time out, we got a similar AMQ149005 log 
entries in backup broker:

024-04-25 14:21:14,843 WARN  
[org.apache.activemq.artemis.core.replication.ReplicationEndpoint] AMQ149005: 
Message of 501772 bytes is bigger than the max record size of 501760 bytes. You 
should try to move large application properties to the message body.
org.apache.activemq.artemis.api.core.ActiveMQIOErrorException: AMQ149005: 
Message of 501772 bytes is bigger than the max record size of 501760 bytes. You 
should try to move large application properties to the message body.

Same entry was also in live broker log, but preceded with an AMQ224016:

2024-04-25 14:21:14,745 ERROR [org.apache.activemq.artemis.core.server] 
AMQ224016: Caught exception
org.apache.activemq.artemis.api.core.ActiveMQIOErrorException: AMQ149005: 
Message of 501772 bytes is bigger than the max record size of 501760 bytes. You 
should try to move large application properties to the message body.

Main difference to production issue is the absence of AMQ149005 in live broker 
log, what I cannot explain.

According to https://issues.apache.org/jira/browse/DISPATCH-2254 such messages 
with too big properties may have huge impact on brokers.
We need to identify the sender of that messages, but only information is time 
point and size of application properies.

Is there any option to get more log output from live or backup broker to get 
more message properties like queue name?

Kind regards,
Arno



Von: Arno Schuerhoff <arno.schuerh...@havi.com>
Gesendet: Dienstag, 23. April 2024 13:54
An: users@activemq.apache.org
Betreff: Suspected slow replication

Hello all, we are using Artemis MQ in a fail-safe configuration with active 
primary broker and replicating backup. This works since years and many thanks 
for this great application. Some weeks ago, we have replaced old broker pair by 
a fresh

Hello all,

we are using Artemis MQ in a fail-safe configuration with active primary broker 
and replicating backup.
This works since years and many thanks for this great application.

Some weeks ago, we have replaced old broker pair by a fresh installation using 
Artemis MQ version 2.31.2.
Starting with that time point, we experienced slow responses from primary 
broker if sending a message.
Clients needed up to 120 seconds to deliver a message, usual value is about 200 
milliseconds.

Last week we had some pressure from business regarding low data throughput.
I decided to add some more logging to primary broker and changed two lines in 
broker.xml:

<property key="LOG_CONNECTION_EVENTS" value="false"/> became <property 
key="LOG_CONNECTION_EVENTS" value="true"/>
<property key="LOG_SENDING_EVENTS" value="false"/> became <property 
key="LOG_SENDING_EVENTS" value="true"/>

Plugin 
org.apache.activemq.artemis.core.server.plugin.impl.LoggingActiveMQServerPlugin 
was already activated to get log messages for expired messages by setting 
<property key="LOG_INTERNAL_EVENTS" value="true"/>.

After saving the broker.xml the broker has reloaded the configuration and the 
problem of slow responses when sending a message disappeared.
I have evaluated logs from client before and after that configuration change, 
we are back to 200 milliseconds and below by configuration reload for logging 
two more events.

Excerpt from primary broker log:

2024-04-18 20:15:32,069 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221056: Reloading configuration: security
2024-04-18 20:15:32,074 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221056: Reloading configuration: address settings
2024-04-18 20:15:32,077 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221056: Reloading configuration: diverts
2024-04-18 20:15:32,086 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222092: Connection to the backup node failed, removing replication now
org.apache.activemq.artemis.api.core.ActiveMQRemoteDisconnectException: null
        at 
org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:598)
 ~[artemis-server-2.31.2.jar:2.31.2]
        at 
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:974)
 ~[artemis-server-2.31.2.jar:2.31.2]
        at 
org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.lambda$channelInactive$0(ActiveMQChannelHandler.java:89)
 ~[artemis-core-client-2.31.2.jar:2.31.2]
        at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57)
 ~[artemis-commons-2.31.2.jar:2.31.2]
        at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
 ~[artemis-commons-2.31.2.jar:2.31.2]
        at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
 ~[artemis-commons-2.31.2.jar:2.31.2]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
[?:?]
        at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
 [artemis-commons-2.31.2.jar:2.31.2]
2024-04-18 20:15:32,103 INFO  
[org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation] 
Server [ActiveMQServerImpl::name=amq_prod_primary], incremented local 
activation sequence to: 7 for NodeId = peer.amq.prod+
2024-04-18 20:15:32,109 INFO  
[org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation] 
Server [ActiveMQServerImpl::name=amq_prod_primary], incremented coordinated 
activation sequence to: 7 for NodeId = peer.amq.prod+
2024-04-18 20:15:38,072 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221056: Reloading configuration: addresses

After reading the AMQ222092 related entry about failed connection to backup 
node I suspect we had a problem all the time with the replication connection.
If sending a message to the queue the primary broker doesn’t send back the 
message is received until message is replicated to backup. Slow replication 
would mean slow response times.

How can I check if we really had a replication problem?
Backup broker log for same day starts with entries probably related to reload 
on primary:

2024-04-18 20:15:32,082 ERROR [org.apache.activemq.artemis.core.client] 
AMQ214013: Failed to decode packet
java.lang.IndexOutOfBoundsException: readerIndex(28) + length(276) exceeds 
writerIndex(300): PooledUnsafeDirectByteBuf(ridx: 28, widx: 300, cap: 300)
…
2024-04-18 20:15:32,082 ERROR [org.apache.activemq.artemis.core.client] 
AMQ214031: Failed to decode buffer, disconnect immediately.
java.lang.IllegalStateException: java.lang.IndexOutOfBoundsException: 
readerIndex(28) + length(276) exceeds writerIndex(300): 
PooledUnsafeDirectByteBuf(ridx: 28, widx: 300, cap: 300)
…

Any help enlightening me why problem of slow responses disappeared by 
configuration reload or links to documentation what I may have missed are 
appreciated.

Kind regards,
Arno


Arno Schürhoff (he/him/his)
Solution Architect ESB, Enterprise Data Services
HAVI Logistics Business Services GmbH
Geitlingstr. 20, 47228 Duisburg, Germany
Phone +49 2065 7084991
Mobile +49 173 2576088
arno.schuerh...@havi.com<mailto:arno.schuerh...@havi.com>
HAVI.com<http://www.havi.com/>

[cid:image001.png@01DA9574.0C8D6930]

Geschäftsführer: Arnd Christochowitz
Die Gesellschaft ist eingetragen beim Registergericht Duisburg (Sitz der 
Gesellschaft) Nr. HRB 8829.
Please consider the environment before printing this email.




______________________________________________________________________

This e-mail and any accompanying documents or files contain information that is 
the
property of HAVI, that is intended solely for those to whom this e-mail is 
addressed
(i.e., those identified in the "To" and "Cc" boxes), and that is confidential, 
proprietary,
and/or privileged. If you are not an intended recipient of this e-mail, you are 
hereby
notified that any viewing, use, disclosure, forwarding, copying, or 
distribution of any of
this information is strictly prohibited and may be subject to legal sanctions. 
If you have
received this e-mail in error, please notify the sender immediately of any 
unintended
recipients, and delete the e-mail, all attachments, and all copies of both from 
your system.

While we have taken reasonable precautions to ensure that any attachments to 
this e-mail
have been swept for viruses, we cannot accept liability for any damage 
sustained as a
result of software viruses.

______________________________________________________________________

AW: Suspected slow replication

Reply via email to