I realize you're attempting to simulate a network outage, but from what I understand using SIGSTOP isn't necessarily an accurate way to do it. It was explained to me awhile back by a colleague who had done quite a bit of work in this area that SIGSTOP works differently at the socket level from something like pulling a network cable out of a NIC or even killing the process. See more here [1]. In mention this because you might want to develop an alternate testing mechanism to more accurately simulate a network outage use-case.
Justin [1] https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop ----- Original Message ----- From: "martk" <123mar...@web.de> To: users@activemq.apache.org Sent: Monday, May 8, 2017 2:33:58 AM Subject: Artemis HA cluster with replication Hello, I am using ActiveMQ Artemis 1.5.4 and configured a high available cluster (master/slave broker) with replication (using static connectors; see main configuration parts below). Under normal conditions (network connection fails or process shutdown/kill) the switch from master to slave and backwards (desired to do by hand) works nearly all the time (sometimes the backup server is not in sync although both were parallel started for quite a time). Simulating a busy master server results in two active master broker (processing messages but with no replication any more). To test/reproduce I have done the following steps: 1. Master and slave proper started (master is live and slave is backup). 2. Master stopped by sending the SIGSTOP signal to the process. After some time the slave recognized the problem and gets live. 4. Sending the SIGCONT signal to the master process causes a running master and slave. This could then only be resolved with a manual shutdown of both and probably a lose of messages. I would like to ensure only one live broker at the same time and the other to do the backup (a shared storage is not possible). Maybe it can be resolved by configuration otherwise I think that is a bug because both server should always perform a continuous live-check. -------------------- master-broker.xml <?xml version='1.0'?> <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd"> <core xmlns="urn:activemq:core"> <name>master</name> <persistence-enabled>true</persistence-enabled> <ha-policy> <replication> <master> <check-for-live-server>true</check-for-live-server> </master> </replication> </ha-policy> <connectors> <connector name="netty-connector">tcp://MASTERIP:61616</connector> <connector name="netty-backup-connector-slave">tcp://SLAVEIP:61616</connector> </connectors> <acceptors> <acceptor name="netty-acceptor">tcp://MASTERIP:61616</acceptor> </acceptors> <cluster-connections> <cluster-connection name="cluster"> <address>jms</address> <connector-ref>netty-connector</connector-ref> <retry-interval>500</retry-interval> <use-duplicate-detection>true</use-duplicate-detection> <static-connectors> <connector-ref>netty-backup-connector-slave</connector-ref> </static-connectors> </cluster-connection> </cluster-connections> </core> </configuration> -------------------- slave-broker.xml <?xml version='1.0'?> <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd"> <core xmlns="urn:activemq:core"> <name>slave1</name> <persistence-enabled>true</persistence-enabled> <ha-policy> <replication> <slave> <allow-failback>false</allow-failback> </slave> </replication> </ha-policy> <connectors> <connector name="netty-live-connector">tcp://MASTERIP:61616</connector> <connector name="netty-connector">tcp://SLAVEIP:61616</connector> </connectors> <acceptors> <acceptor name="netty-acceptor">tcp://SLAVEIP:61616</acceptor> </acceptors> <cluster-connections> <cluster-connection name="cluster"> <address>jms</address> <connector-ref>netty-connector</connector-ref> <retry-interval>500</retry-interval> <use-duplicate-detection>true</use-duplicate-detection> <static-connectors> <connector-ref>netty-live-connector</connector-ref> </static-connectors> </cluster-connection> </cluster-connections> </core> </configuration> -------------------- Regards, Martin -- View this message in context: http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.