Hello, I have a replicating master-slave -pair of Artemis 2.5.0. It seems to mainly work as it should, but it started logging replication failures after having run for a week. I have included my broker configuration in [1]. There are also exceptions on the client side after the replication problems surfaced [2]. Why would replication suddenly start failing without an apparent reason? So far I have encountered issues with every HA strategy I have tried.
I started both nodes at 2018-05-08 and yesterday, 2018-05-14 node a, the master, logs (multiple times): 09:08:47,318 WARN [org.apache.activemq.artemis.core.server] Error during message dispatch: java.lang.RuntimeException: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=File 126743097.msg has a null channel] at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getReadOnlyBodyBuffer(LargeServerMessageImpl.java:216) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.protocol.openwire.OpenWireMessageConverter.toAMQMessage(OpenWireMessageConverter.java:516) [artemis-openwire-protocol-2.5.0.jar:] at org.apache.activemq.artemis.core.protocol.openwire.OpenWireMessageConverter.createMessageDispatch(OpenWireMessageConverter.java:492) [artemis-openwire-protocol-2.5.0.jar:] at org.apache.activemq.artemis.core.protocol.openwire.amq.AMQConsumer.handleDeliver(AMQConsumer.java:227) [artemis-openwire-protocol-2.5.0.jar:] at org.apache.activemq.artemis.core.protocol.openwire.amq.AMQSession.sendMessage(AMQSession.java:312) [artemis-openwire-protocol-2.5.0.jar:] at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.deliverStandardMessage(ServerConsumerImpl.java:1099) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.proceedDeliver(ServerConsumerImpl.java:462) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.proceedDeliver(QueueImpl.java:2900) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2374) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$2000(QueueImpl.java:105) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:3173) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_162] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0] Caused by: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=File 126743097.msg has a null channel] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.read(NIOSequentialFile.java:163) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.read(NIOSequentialFile.java:155) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getReadOnlyBodyBuffer(LargeServerMessageImpl.java:213) [artemis-server-2.5.0.jar:2.5.0] ... 16 more And node b. the slave, logs (multiple times): 14:07:43,549 WARN [org.apache.activemq.artemis.journal] AMQ142011: Error on reading compacting for JournalFileImpl: (activemq-data-115.amq id = 141, recordID = 141) 14:07:43,572 ERROR [org.apache.activemq.artemis.journal] AMQ144003: Error compacting: java.lang.Exception: Error on reading compacting for JournalFileImpl: (activemq-data-115.amq id = 141, recordID = 141) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1569) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl$14.run(JournalImpl.java:2111) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_152] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_152] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0] Caused by: java.lang.Exception: Inconsistency during compacting: RollbackRecord ID = 126646250 for an already rolled back transaction during compacting at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:761) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1566) [artemis-journal-2.5.0.jar:2.5.0] ... 7 more Caused by: java.lang.IllegalStateException: Inconsistency during compacting: RollbackRecord ID = 126646250 for an already rolled back transaction during compacting at org.apache.activemq.artemis.core.journal.impl.JournalCompactor.onReadRollbackRecord(JournalCompactor.java:349) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:733) [artemis-journal-2.5.0.jar:2.5.0] ... 8 more Then when I restart node b, the master is unable to start replication at first: 08:58:28,580 WARN [org.apache.activemq.artemis.journal] AMQ142011: Error on reading compacting for JournalFileImpl: (activemq-data-155.amq id = 155, recordID = 155) 08:58:28,607 ERROR [org.apache.activemq.artemis.journal] AMQ144003: Error compacting: java.lang.Exception: Error on reading compacting for JournalFileImpl: (activemq-data-155.amq id = 155, recordID = 155) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1569) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl$12.run(JournalImpl.java:1466) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) [artemis-commons-2.5.0.jar:2.5.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_162] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_162] at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.5.0.jar:2.5.0] Caused by: java.lang.Exception: Inconsistency during compacting: RollbackRecord ID = 133826263 for an already rolled back transaction during compacting at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:761) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.compact(JournalImpl.java:1566) [artemis-journal-2.5.0.jar:2.5.0] ... 7 more Caused by: java.lang.IllegalStateException: Inconsistency during compacting: RollbackRecord ID = 133826263 for an already rolled back transaction during compacting at org.apache.activemq.artemis.core.journal.impl.JournalCompactor.onReadRollbackRecord(JournalCompactor.java:349) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.readJournalFile(JournalImpl.java:733) [artemis-journal-2.5.0.jar:2.5.0] ... 8 more 08:58:28,608 WARN [org.apache.activemq.artemis.core.server] AMQ222013: Error when trying to start replication: java.lang.RuntimeException: Error during compact, look at the logs at org.apache.activemq.artemis.core.journal.impl.JournalImpl.scheduleCompactAndBlock(JournalImpl.java:1481) [artemis-journal-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:553) [artemis-server-2.5.0.jar:2.5.0] at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:178) [artemis-server-2.5.0.jar:2.5.0] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_162] The replication continues normally after I restart node b a second time. [1] Sample broker configurations https://github.com/ilkkavi/activemq-artemis/blob/scaledown-issue/issues/IssueExample/src/main/resources/replication/ [2] Sample of client side errors (some details omitted): 23:59:23.939 [container-245] WARN com.atomikos.jms.AbstractJmsProxy - Error delegating call to createQueue on JMS driver javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:399) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.queueQuery(ActiveMQSessionContext.java:254) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.queueQuery(ClientSessionImpl.java:344) at org.apache.activemq.artemis.jms.client.ActiveMQSession.lookupQueue(ActiveMQSession.java:1064) at org.apache.activemq.artemis.jms.client.ActiveMQSession.createQueue(ActiveMQSession.java:358) ... Caused by: org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 ... 78 common frames omitted 23:59:23.939 [container-245] ERROR o.s.j.l.a.MessageListenerAdapter - Listener execution failed org.springframework.jms.listener.adapter.ListenerExecutionFailedException: Listener method 'handleMessage' threw exception; nested exception is org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 at org.springframework.jms.listener.adapter.MessageListenerAdapter.invokeListenerMethod(MessageListenerAdapter.java:322) at org.springframework.jms.listener.adapter.MessageListenerAdapter.onMessage(MessageListenerAdapter.java:243) at java.lang.Thread.run(Thread.java:748) Caused by: org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:658) ... ... 12 common frames omitted Caused by: javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:399) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.queueQuery(ActiveMQSessionContext.java:254) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.queueQuery(ClientSessionImpl.java:344) at org.apache.activemq.artemis.jms.client.ActiveMQSession.lookupQueue(ActiveMQSession.java:1064) at org.apache.activemq.artemis.jms.client.ActiveMQSession.createQueue(ActiveMQSession.java:358) at sun.reflect.GeneratedMethodAccessor114.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.atomikos.jms.AtomikosJmsXaSessionProxy.invoke(AtomikosJmsXaSessionProxy.java:174) at com.sun.proxy.$Proxy185.createQueue(Unknown Source) at org.springframework.jms.support.destination.DynamicDestinationResolver.resolveQueue(DynamicDestinationResolver.java:84) at org.springframework.jms.support.destination.DynamicDestinationResolver.resolveDestinationName(DynamicDestinationResolver.java:58) at org.springframework.jms.support.destination.JmsDestinationAccessor.resolveDestinationName(JmsDestinationAccessor.java:114) at org.springframework.jms.core.JmsTemplate.access$200(JmsTemplate.java:90) at org.springframework.jms.core.JmsTemplate$4.doInJms(JmsTemplate.java:573) at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:484) ... 61 common frames omitted Caused by: org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 45 ... 78 common frames omitted 23:59:58.166 [Atomikos:2596] WARN o.a.activemq.artemis.core.client - AMQ212052: Packet PACKET(SessionQueueQueryResponseMessage_V2)[type=-7, channelID=12, packetObject=SessionQueueQueryResponseMessage_V2, address=jms.queue.queue.name, name=queue.name, consumerCount=1, filterString=null, durable=true, exists=true, temporary=false, messageCount=0, autoCreationEnabled=true] was answered out of sequence due to a previous server timeout and it's being ignored java.lang.Exception: trace at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:384) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:318) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.xaEnd(ActiveMQSessionContext.java:383) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.end(ClientSessionImpl.java:1173) at com.atomikos.datasource.xa.XAResourceTransaction.suspend(XAResourceTransaction.java:253) at com.atomikos.datasource.xa.XAResourceTransaction.rollback(XAResourceTransaction.java:448) at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:47) at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:20) at com.atomikos.icatch.imp.PropagationMessage.submit(PropagationMessage.java:67) at com.atomikos.icatch.imp.Propagator$PropagatorThread.run(Propagator.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)