RE: [EXTERNAL] Re: Configuration check-for-live-server recommendation for backup server

Gunawan, Rahman (GSFC-703.H)[Halvik Corp] Fri, 08 Apr 2022 05:23:50 -0700

We only  need 1 queue to be created so we can't have 3 master/slave pairs 
listening to the same queue.  The network ping configuration (put several UNIX 
servers to be ping) detects the network was unhealthy so Artemis went to sleep.
Configuration allow-failback in slave was configured to be false to avoid flip 
flop problem.  
The problem if the sequences below occurred:
1. Primary/master starts and active
2. Slave starts as backup.
3. Master was isolated from network.
4. Slave became active.
5. Master recovered from network isolation.
6. Master woke up but  detects active server, announced as backup.
7. Slave was isolated from network.
8. Master became active.
9. Slave recovered from network isolation.
10. Slave woke up but because allow-failback = false and there is no 
configuration check-for-live-server, slave became active while master was also 
active.


-----Original Message-----
From: Justin Bertram <jbert...@apache.org> 
Sent: Thursday, April 7, 2022 10:42 PM
To: users@activemq.apache.org
Subject: [EXTERNAL] Re: Configuration check-for-live-server recommendation for 
backup server

The check-for-live-server controls what happens when a master broker is 
*started*. If it's false then it will activate even if there is already another 
broker on the network with its ID, but if it's true then it will check first 
and if it finds another broker on the network with its ID then it will become a 
backup to that broker.

On the other hand, a replication slave *always* starts as a backup no matter 
what.

If you want to mitigate split brain in this case then you need a proper quorum. 
In order to get this you can either configure 3 master/slave pairs or you can 
integrate with ZooKeeper via the pluggable quorum vote replication 
configuration [1]. A single master/slave pair simply cannot avoid split brain 
in every possible situation.


Justin

[1]
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23pluggable-quorum-vote-replication-configurations&amp;data=04%7C01%7Crahman.gunawan%40nasa.gov%7C3c40f2b290a24597b5fd08da19097938%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637849825728388437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=P%2Ba74ZA6TYwrZy383kpu0Z8f%2BJqjcmjperyL5efsVdI%3D&amp;reserved=0

On Wed, Apr 6, 2022 at 10:06 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp] 
<rahman.guna...@nasa.gov.invalid> wrote:

> Hi,
> I would like to recommend to add configuration "<check-for-live-server>"
> to backup server.  I tested artemis replication mode with the 
> following
> configuration:
> Primary:
>      <ha-policy>
>          <replication>
>             <master>
>
>  <vote-on-replication-failure>true</vote-on-replication-failure>
>                <check-for-live-server>true</check-for-live-server>
>             </master>
>          </replication>
>       </ha-policy>
>
> Backup server:
>       <ha-policy>
>          <replication>
>             <slave>
>               <allow-failback>false</allow-failback>
>              </slave>
>          </replication>
>       </ha-policy>
>
> We also enable ping on both primary and backup server.
>
> 1.      When the network card in primary was disabled, after around 2
> minutes, the backup server went live while the primary server was 
> still isolated from network.
>
> 2.      After network card in primary server was enabled, artemis in
> primary woke up but it detected a live server was already active so it 
> announced as backup.
>
> 3.      Then, network card in the backup server was disabled, after around
> 2 minutes, the primary server went live while the backup server was 
> still isolated from network.
>
> 4.      After network card in the backup server was enabled, the backup
> server woke but because there was no configuration to check for live 
> server, it went live while the primary server also live (split brain issue).
>
> Any reason why the backup server doesn't have configuration 
> "<check-for-live-server>"?
> Thanks
>
> Regards,
> Rahman
>

RE: [EXTERNAL] Re: Configuration check-for-live-server recommendation for backup server

Reply via email to