Hi All,
We have an ActiveMQ Artemis instance with primary/backup replication. Sometimes
both servers fail to replicate after its restart. I have examined the logs of
both servers, and mentioned a strange warnings in the log of first server right
after it begins to stop:
1. First server is stopping, and at this time the second server is still alive
and becomes primary. But there is a warning that primary node has failed to
connect to the backup node. Is it ok?
2024-07-24 03:03:31,918 WARN [org.apache.activemq.artemis.core.server]
AMQ222092: Connection to the backup node failed, removing replication now
org.apache.activemq.artemis.api.core.ActiveMQRemoteDisconnectException: null
at
org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:607)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionDestroyed(NettyAcceptor.java:974)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.lambda$channelInactive$0(ActiveMQChannelHandler.java:89)
~[artemis-core-client-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
~[artemis-commons-2.35.0.jar:2.35.0]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
[artemis-commons-2.35.0.jar:2.35.0]
2. NullPointerException in MessageReferenceImpl.java:50 (this warning repeats
two times).
2024-07-24 03:03:31,919 WARN [org.apache.activemq.artemis.core.server]
AMQ222246: Unable to rollback on close
java.lang.NullPointerException: Cannot invoke
"org.apache.activemq.artemis.core.server.MessageReference.getSequence()"
because "o1" is null
at
org.apache.activemq.artemis.core.server.impl.MessageReferenceImpl$MessageReferenceComparatorSequence.compare(MessageReferenceImpl.java:50)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.MessageReferenceImpl$MessageReferenceComparatorSequence.compare(MessageReferenceImpl.java:42)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.collections.LinkedListImpl.scanLeft(LinkedListImpl.java:286)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.collections.LinkedListImpl.addSorted(LinkedListImpl.java:248)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.collections.PriorityLinkedListImpl.addSorted(PriorityLinkedListImpl.java:93)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.internalAddSorted(QueueImpl.java:3069)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.addSorted(QueueImpl.java:1180)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.addSorted(QueueImpl.java:1215)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.postRollback(QueueImpl.java:4247)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:131)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:600)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:436)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:226)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:168)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.afterCompleteOperations(AbstractJournalStorageManager.java:352)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.internalRollback(TransactionImpl.java:427)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.rollback(TransactionImpl.java:400)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.cancelAndRollback(ServerSessionImpl.java:2287)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doRollback(ServerSessionImpl.java:2262)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.rollback(ServerSessionImpl.java:1375)
~[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doClose(ServerSessionImpl.java:431)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl$1.callDoClose(ServerSessionImpl.java:1752)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl$1.done(ServerSessionImpl.java:1747)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:226)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.executeOnCompletion(OperationContextImpl.java:168)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:1739)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:1723)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.protocol.openwire.amq.AMQSession.close(AMQSession.java:559)
[artemis-openwire-protocol-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.disconnect(OpenWireConnection.java:720)
[artemis-openwire-protocol-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.disconnect(OpenWireConnection.java:755)
[artemis-openwire-protocol-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.freeze(RemotingServiceImpl.java:375)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation.freezeConnections(ReplicationPrimaryActivation.java:106)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.freezeConnections(ActiveMQServerImpl.java:1579)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1352)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1271)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1255)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1062)
[artemis-server-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.integration.FileBroker.stop(FileBroker.java:94)
[artemis-cli-2.35.0.jar:2.35.0]
at org.apache.activemq.artemis.cli.commands.Run.stop(Run.java:232)
[artemis-cli-2.35.0.jar:2.35.0]
at org.apache.activemq.artemis.cli.commands.Run$3.run(Run.java:223)
[artemis-cli-2.35.0.jar:2.35.0]
I'm not sure that it is related to the replication issues, but if it is a bug,
I can create a Jira.
Unfortunately I don't know how to reproduce this warning in isolated
environment.
3. During replication of the first node which was restarted as backup, some
errors a written to the log:
2024-07-24 03:05:01,343 ERROR
[org.apache.activemq.artemis.core.journal.impl.JournalImpl] Exception during
appendAddRecord:
java.lang.NullPointerException: Cannot invoke
"org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because
"this.currentFile" is null
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:3523)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:3251)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:938)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
~[artemis-commons-2.35.0.jar:2.35.0]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
[artemis-commons-2.35.0.jar:2.35.0]
2024-07-24 03:05:01,360 WARN [org.apache.activemq.artemis.journal] AMQ142021:
Error on IO callback, Cannot invoke
"org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because
"this.currentFile" is null
2024-07-24 03:05:01,361 WARN [org.apache.activemq.artemis.journal] AMQ142021:
Error on IO callback, Cannot invoke
"org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because
"this.currentFile" is null
2024-07-24 03:05:01,361 ERROR
[org.apache.activemq.artemis.core.journal.impl.JournalImpl] Exception during
appendAddRecord:
java.lang.NullPointerException: Cannot invoke
"org.apache.activemq.artemis.core.journal.impl.JournalFile.getFile()" because
"this.currentFile" is null
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:3523)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:3251)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:938)
~[artemis-journal-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:57)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
~[artemis-commons-2.35.0.jar:2.35.0]
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
~[artemis-commons-2.35.0.jar:2.35.0]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
[artemis-commons-2.35.0.jar:2.35.0]
2024-07-24 03:05:15,383 INFO [org.apache.activemq.artemis.core.server]
AMQ221024: Backup server ActiveMQServerImpl::name=amq-test is synchronized with
primary server, nodeID=327b04af-48c3-11ef-a9c3-0050568560e3.
2024-07-24 03:05:15,390 INFO [org.apache.activemq.artemis.core.server]
AMQ221031: backup announced
These errors happen very often on other servers when trying to replicate (with
success or failure).
I also did not reproduced it in isolated environment.
4. There is a warning which repeats several times:
2024-07-24 03:06:25,434 WARN
[org.apache.activemq.artemis.core.server.impl.ReplicationPrimaryActivation]
Timed out waiting for failback server activation with NodeID =
327b14af-48c3-11ef-a9c3-0050568560e3: and sequence > 5: after 4000ms
And found that this timeout is hard-coded in
ReplicationPrimaryActivation.java:69.
I need to examine why this timeout was reached. Test environment can work on
low-level resources and sometimes we may need to configure this timeout.
--
Best regards,
Aleksandr
-----------------------------------
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient any
use, distribution, copying or disclosure is strictly prohibited. If you have
received this message in error, please notify the sender immediately either by
telephone or by e-mail and delete this message and any attachment from your
system. Correspondence via e-mail is for information purposes only. AO
Raiffeisenbank neither makes nor accepts legally binding statements by e-mail
unless otherwise agreed.
-----------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact