Also I am assuming you have checked already that the master is not GC’ing and having a large pause due to gc or something like that.
Sent from my iPhone > On 22 Sep 2017, at 19:43, Michael André Pearce <michael.andre.pea...@me.com> > wrote: > > https://activemq.apache.org/artemis/docs/latest/network-isolation.html > > Sent from my iPhone > >> On 22 Sep 2017, at 19:41, Michael André Pearce <michael.andre.pea...@me.com> >> wrote: >> >> I am assuming you had possibly a temp network fault meaning the slave and >> master could not talk. >> >> Have you configured network pinger? If / when you have network issues >> possibly causing a split brain (master and slave cannot talk to each other) >> then the nodes also ping another device on the network with the idea one >> would fail, and thus help avoid the issue of this split brain scenario. >> >> >> Cheers >> Mike >> >> >> Sent from my iPhone >> >>> On 22 Sep 2017, at 17:49, boris_snp <boris.godu...@spglobal.com> wrote: >>> >>> I have to restart my 2 broker cluster on a daily basis due to the following >>> sequence of events: >>> ----------------------------------------------------------------------------------------------- >>> master >>> 04:51:14,501 AMQ212037: Connection failure has been detected: AMQ119014: >>> Did >>> not receive data from /10.202.147.99:58739 within the 60,000ms connection >>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] >>> 04:51:14,510 AMQ222092: Connection to the backup node failed, removing >>> replication now: >>> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT >>> message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the >>> 60,000ms connection TTL. The connection will now be closed.] >>> 04:51:24,517 AMQ212041: Timed out waiting for netty channel to close >>> 04:51:24,517 AMQ212037: Connection failure has been detected: AMQ119014: >>> Did >>> not receive data from /10.202.147.99:58738 within the 60,000ms connection >>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] >>> ----------------------------------------------------------------------------------------------- >>> slave >>> 04:51:42,306 >>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive >>> data from server for >>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local= >>> /10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410] >>> [code=CONNECTION_TIMEDOUT] >>> 04:51:42,316 >>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive >>> data from server for >>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local= >>> /10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410] >>> [code=CONNECTION_TIMEDOUT] >>> 04:51:46,955 AMQ221037: >>> ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to >>> become 'live' >>> 04:51:59,360 AMQ221014: 40% loaded >>> 04:52:01,854 AMQ221014: 81% loaded >>> 04:52:03,037 AMQ222028: Could not find page cache for page >>> PagePositionImpl >>> [pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal >>> 04:52:03,051 AMQ222028: Could not find page cache for page >>> PagePositionImpl >>> [pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal >>> 04:52:03,208 AMQ221003: Deploying queue jms.queue.DLQ >>> 04:52:03,281 AMQ221003: Deploying queue jms.queue.ExpiryQueue >>> 04:52:03,827 AMQ212034: There are more than one servers on the network >>> broadcasting the same node id. >>> ----------------------------------------------------------------------------------------------- >>> master >>> 04:52:03,827 AMQ212034: There are more than one servers on the network >>> broadcasting the same node id. >>> ----------------------------------------------------------------------------------------------- >>> slave >>> 04:52:03,910 AMQ221007: Server is now live >>> 04:52:04,003 AMQ221020: Started Acceptor at nj09mhf0681:41411 for >>> protocols >>> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE] >>> 04:52:11,949 AMQ212034: There are more than one servers on the network >>> broadcasting the same node id. >>> ----------------------------------------------------------------------------------------------- >>> I understand that at some point master (now live) loses slave and closes >>> connection to it. >>> Slave (backup now) in turn detects that master is not present and becomes >>> live. Now both brokers are live and never recover to normal until restart. >>> How can I avois this? Will appreciate any help. >>> Thank you. >>> >>> >>> >>> -- >>> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html