Re: 2 broker clusetr, both brokers are live

Michael André Pearce Fri, 22 Sep 2017 11:46:24 -0700

Also I am assuming you have checked already that the master is not GC’ing and 
having a large pause due to gc or something like that.


Sent from my iPhone

> On 22 Sep 2017, at 19:43, Michael André Pearce <michael.andre.pea...@me.com> 
> wrote:
> 
> https://activemq.apache.org/artemis/docs/latest/network-isolation.html
> 
> Sent from my iPhone
> 
>> On 22 Sep 2017, at 19:41, Michael André Pearce <michael.andre.pea...@me.com> 
>> wrote:
>> 
>> I am assuming you had possibly a temp network fault meaning the slave and 
>> master could not talk.
>> 
>> Have you configured network pinger? If / when you have network issues 
>> possibly causing a split brain (master and slave cannot talk to each other) 
>> then the nodes also ping another device on the network with the idea one 
>> would fail, and thus help avoid the issue of this split brain scenario.
>> 
>> 
>> Cheers
>> Mike 
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 22 Sep 2017, at 17:49, boris_snp <boris.godu...@spglobal.com> wrote:
>>> 
>>> I have to restart my 2 broker cluster on a daily basis due to the following
>>> sequence of events:
>>> -----------------------------------------------------------------------------------------------
>>> master
>>> 04:51:14,501    AMQ212037: Connection failure has been detected: AMQ119014: 
>>> Did
>>> not receive data from /10.202.147.99:58739 within the 60,000ms connection
>>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>> 04:51:14,510    AMQ222092: Connection to the backup node failed, removing
>>> replication now:
>>> ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT
>>> message=AMQ119014: Did not receive data from /10.202.147.99:58739 within the
>>> 60,000ms connection TTL. The connection will now be closed.]
>>> 04:51:24,517    AMQ212041: Timed out waiting for netty channel to close
>>> 04:51:24,517    AMQ212037: Connection failure has been detected: AMQ119014: 
>>> Did
>>> not receive data from /10.202.147.99:58738 within the 60,000ms connection
>>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>> -----------------------------------------------------------------------------------------------
>>> slave
>>> 04:51:42,306    
>>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
>>> data from server for
>>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@1c54a4bc[local=
>>> /10.202.147.99:58738, remote=nj09mhf0681/10.202.147.99:41410]
>>> [code=CONNECTION_TIMEDOUT]
>>> 04:51:42,316    
>>> AMQ212037: Connection failure has been detected: AMQ119011: Did not receive
>>> data from server for
>>> org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@65ace922[local=
>>> /10.202.147.99:58739, remote=nj09mhf0681/10.202.147.99:41410]
>>> [code=CONNECTION_TIMEDOUT]
>>> 04:51:46,955    AMQ221037:
>>> ActiveMQServerImpl::serverUUID=7ffa29a0-7c48-11e7-9784-e83935127b09 to
>>> become 'live'
>>> 04:51:59,360    AMQ221014: 40% loaded
>>> 04:52:01,854    AMQ221014: 81% loaded
>>> 04:52:03,037    AMQ222028: Could not find page cache for page 
>>> PagePositionImpl
>>> [pageNr=8, messageNr=-1, recordID=8662153341] removing it from the journal
>>> 04:52:03,051    AMQ222028: Could not find page cache for page 
>>> PagePositionImpl
>>> [pageNr=13, messageNr=-1, recordID=8662204094] removing it from the journal
>>> 04:52:03,208    AMQ221003: Deploying queue jms.queue.DLQ
>>> 04:52:03,281    AMQ221003: Deploying queue jms.queue.ExpiryQueue
>>> 04:52:03,827    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> master
>>> 04:52:03,827    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> slave
>>> 04:52:03,910    AMQ221007: Server is now live
>>> 04:52:04,003    AMQ221020: Started Acceptor at nj09mhf0681:41411 for 
>>> protocols
>>> [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
>>> 04:52:11,949    AMQ212034: There are more than one servers on the network
>>> broadcasting the same node id.
>>> -----------------------------------------------------------------------------------------------
>>> I understand that at some point master (now live) loses slave and closes
>>> connection to it.
>>> Slave (backup now) in turn detects that master is not present and becomes
>>> live. Now both brokers are live and never recover to normal until restart.
>>> How can I avois this? Will appreciate any help.
>>> Thank you.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: 2 broker clusetr, both brokers are live

Reply via email to