Hi,

we had an incident where our applications sent too much traffic to Artemis 
broker and the broker got Java Heap Out Of Memory errors. I’m trying to 
understand why backup broker never became primary after this happened.
We run Artemis static cluster with two nodes (primary and backup) under Shared 
Storage. Configuration is pretty straightforward:
      <ha-policy>
         <shared-store>
            <primary>
               <failover-on-shutdown>true</failover-on-shutdown>
            </primary>
         </shared-store>
      </ha-policy>

      <ha-policy>
         <shared-store>
            <backup>
               <failover-on-shutdown>true</failover-on-shutdown>
            </backup>
         </shared-store>
      </ha-policy>

Backup always becomes primary if we reboot primary during maintenance. We also 
tested our HA configuration with various other tests, like disabling network 
connections, killing storage mount point, etc, so I’m positive configuration 
should be correct.

Primary logs during that time: https://p.defau.lt/?cNVdPPEMN2qM8XbLZomKdQ
Backup logs during that time: https://p.defau.lt/?nJvfQKdc4rrUGC9lL7_JCA

OOM have happened at ~2:51 and the primary was in this state until I have 
restarted it ~13:25.

Any pointers are much appreciated!

--
   Best Regards,

    Vilius Šumskas
    Rivile
    IT manager

Reply via email to