I realize you're attempting to simulate a network outage, but from what I 
understand using SIGSTOP isn't necessarily an accurate way to do it. It was 
explained to me awhile back by a colleague who had done quite a bit of work in 
this area that SIGSTOP works differently at the socket level from something 
like pulling a network cable out of a NIC or even killing the process. See more 
here [1]. In mention this because you might want to develop an alternate 
testing mechanism to more accurately simulate a network outage use-case.


Justin

[1] 
https://unix.stackexchange.com/questions/202104/what-happens-to-requests-to-a-service-that-is-stopped-with-sigstop

----- Original Message -----
From: "martk" <123mar...@web.de>
To: users@activemq.apache.org
Sent: Monday, May 8, 2017 2:33:58 AM
Subject: Artemis HA cluster with replication

Hello,

I am using ActiveMQ Artemis 1.5.4 and configured a high available cluster
(master/slave broker) with replication (using static connectors; see main
configuration parts below).

Under normal conditions (network connection fails or process shutdown/kill)
the switch from master to slave and backwards (desired to do by hand) works
nearly all the time (sometimes the backup server is not in sync although
both were parallel started for quite a time).

Simulating a busy master server results in two active master broker
(processing messages but with no replication any more). To test/reproduce I
have done the following steps:

1. Master and slave proper started (master is live and slave is backup).
2. Master stopped by sending the SIGSTOP signal to the process. After some
time the slave recognized the problem and gets live.
4. Sending the SIGCONT signal to the master process causes a running master
and slave. This could then only be resolved with a manual shutdown of both
and probably a lose of messages.

I would like to ensure only one live broker at the same time and the other
to do the backup (a shared storage is not possible).
Maybe it can be resolved by configuration otherwise I think that is a bug
because both server should always perform a continuous live-check.


-------------------- master-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>master</name>
     
      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <master>
               <check-for-live-server>true</check-for-live-server>
            </master>
         </replication>
      </ha-policy>

      <connectors>
         <connector name="netty-connector">tcp://MASTERIP:61616</connector>
         <connector
name="netty-backup-connector-slave">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://MASTERIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-backup-connector-slave</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>


-------------------- slave-broker.xml
<?xml version='1.0'?>
<configuration xmlns="urn:activemq"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
               xsi:schemaLocation="urn:activemq
/schema/artemis-configuration.xsd">
   <core xmlns="urn:activemq:core">

      <name>slave1</name>
     
      <persistence-enabled>true</persistence-enabled>

      <ha-policy>
         <replication>
            <slave>
               <allow-failback>false</allow-failback>
            </slave>
         </replication>
      </ha-policy>

      <connectors>
         <connector
name="netty-live-connector">tcp://MASTERIP:61616</connector>
         <connector name="netty-connector">tcp://SLAVEIP:61616</connector>
      </connectors>

      <acceptors>
         <acceptor name="netty-acceptor">tcp://SLAVEIP:61616</acceptor>
      </acceptors>

      <cluster-connections>
         <cluster-connection name="cluster">
            <address>jms</address>
            <connector-ref>netty-connector</connector-ref>
            <retry-interval>500</retry-interval>
            <use-duplicate-detection>true</use-duplicate-detection>
            <static-connectors>
               <connector-ref>netty-live-connector</connector-ref>
            </static-connectors>
         </cluster-connection>
      </cluster-connections>

   </core>
</configuration>


--------------------
Regards,
Martin



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/Artemis-HA-cluster-with-replication-tp4725734.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to