https://activemq.apache.org/artemis/docs/latest/network-isolation.html
Sent from my iPhone > On 22 Sep 2017, at 19:41, Michael André Pearce <michael.andre.pea...@me.com> > wrote: > > I am assuming you had possibly a temp network fault meaning the slave and > master could not talk. > > Have you configured network pinger? If / when you have network issues > possibly causing a split brain (master and slave cannot talk to each other) > then the nodes also ping another device on the network with the idea one > would fail, and thus help avoid the issue of this split brain scenario. > > > Cheers > Mike > > > Sent from my iPhone > >> On 22 Sep 2017, at 17:49, boris_snp <boris.godu...@spglobal.com> wrote: >> >> I have to restart my 2 broker cluster on a daily basis due to the following >> sequence of events: >> ----------------------------------------------------------------------------------------------- >> master >> 04:51:14,501 AMQ212037: Connection failure has been detected: AMQ119014: >> Did >> not receive data from /10.202.147.99:58739 within the 60,000ms connection >> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] >> 04:51:14,510 AMQ222092: Connection to the backup node failed, removing >> replication now: >> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT >> message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the >> 60,000ms connection TTL. The connection will now be closed.] >> 04:51:24,517 AMQ212041: Timed out waiting for netty channel to close >> 04:51:24,517 AMQ212037: Connection failure has been detected: AMQ119014: >> Did >> not receive data from /10.202.147.99:58738 within the 60,000ms connection >> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT] >> ----------------------------------------------------------------------------------------------- >> slave >> 04:51:42,306 >> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive >> data from server for >> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local= >> /10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410] >> [code=CONNECTION_TIMEDOUT] >> 04:51:42,316 >> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive >> data from server for >> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local= >> /10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410] >> [code=CONNECTION_TIMEDOUT] >> 04:51:46,955 AMQ221037: >> ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to >> become 'live' >> 04:51:59,360 AMQ221014: 40% loaded >> 04:52:01,854 AMQ221014: 81% loaded >> 04:52:03,037 AMQ222028: Could not find page cache for page >> PagePositionImpl >> [pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal >> 04:52:03,051 AMQ222028: Could not find page cache for page >> PagePositionImpl >> [pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal >> 04:52:03,208 AMQ221003: Deploying queue jms.queue.DLQ >> 04:52:03,281 AMQ221003: Deploying queue jms.queue.ExpiryQueue >> 04:52:03,827 AMQ212034: There are more than one servers on the network >> broadcasting the same node id. >> ----------------------------------------------------------------------------------------------- >> master >> 04:52:03,827 AMQ212034: There are more than one servers on the network >> broadcasting the same node id. >> ----------------------------------------------------------------------------------------------- >> slave >> 04:52:03,910 AMQ221007: Server is now live >> 04:52:04,003 AMQ221020: Started Acceptor at nj09mhf0681:41411 for >> protocols >> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE] >> 04:52:11,949 AMQ212034: There are more than one servers on the network >> broadcasting the same node id. >> ----------------------------------------------------------------------------------------------- >> I understand that at some point master (now live) loses slave and closes >> connection to it. >> Slave (backup now) in turn detects that master is not present and becomes >> live. Now both brokers are live and never recover to normal until restart. >> How can I avois this? Will appreciate any help. >> Thank you. >> >> >> >> -- >> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html