> Why was the primary server completely down when it was isolated from the network?
I can't really say since you've not really provided any details about this. However, I would guess that since the journal is on NFS and since you killed the broker's network then it encountered a critical IO error and shut itself down. This is the expected behavior. > I configured <network-check-list>, enabled , <network-check-ping-command> and <network-check-ping6-command> so the primary server knew that the network was unhealthy as shown in below log... I've not seen the network pinger enabled for a shared-store configuration as it was explicitly designed for the replicated (i.e. shared nothing) configuration to avoid split-brain. In the shared-store configuration the shared-store itself mitigates against split-brain (e.g. via file locks). I don't believe you need to configure the network pinger given your use of shared-store. Justin On Mon, Feb 28, 2022 at 11:34 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp] <rahman.guna...@nasa.gov.invalid> wrote: > We'll take a look at the NFS configuration. Why was the primary server > completely down when it was isolated from the network? I configured > <network-check-list>, enabled , <network-check-ping-command> and > <network-check-ping6-command> so the primary server knew that the network > was unhealthy as shown in below log: > [org.apache.activemq.artemis.logs] AMQ201001: Network is unhealthy, > stopping service ActiveMQServerImpl > > However; when we enabled back the network card, the primary server was > completely down. I had to start the primary server manually. > > Regards, > Rahman > > -----Original Message----- > From: Justin Bertram <jbert...@apache.org> > Sent: Monday, February 28, 2022 10:15 AM > To: users@activemq.apache.org > Subject: Re: [EXTERNAL] Re: Artemis file locking not released > > The backup and the live do have a direct connection. This allows the > backup to share its connection details with the live. The live then takes > those details and passes them on to clients so that the clients will know > where to connect in case the live fails. > > However, if this connection breaks it is *not* possible for the backup to > simply "unlock" the journal and take over. The only entities which can > unlock the journal is the live broker (who created the lock in the first > place) or NFS itself (e.g. in the case of some kind of connectivity > failure). If the lock is not being released when the live broker's NFS > connectivity fails then I would suggest you have a problem with your NFS > configuration. > > > Justin > > On Mon, Feb 28, 2022 at 6:55 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp] < > rahman.guna...@nasa.gov.invalid> wrote: > > > The backup server knew that the primary server had problem. Below is > > from the log from the backup server: > > ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to > > create netty connection: java.net.UnknownHostException > > > > Thus, I'm thinking if the Artemis primary server lost connection to > > NFS or network, the backup server can detect, unlock the file and take > over. > > Please let me know if you have suggestions. > > Thanks > > > > Regards, > > Rahman > > > > -----Original Message----- > > From: Clebert Suconic <clebert.suco...@gmail.com> > > Sent: Saturday, February 26, 2022 9:27 AM > > To: users@activemq.apache.org > > Subject: [EXTERNAL] Re: Artemis file locking not released > > > > Could be some configuration on the remote file system attributes ? > > > > On Fri, Feb 25, 2022 at 12:03 PM Gunawan, Rahman (GSFC-703.H)[Halvik > > Corp] <rahman.guna...@nasa.gov.invalid> wrote: > > > > > I'm using Artemis 2.19.1. I'm using share file configuration and > > > testing a scenario where the primary Artemis server is isolated from > > > the network by disabling the network card. Because the primary > > > server lost communication to NFS, the file is never unlock and the > > > backup server is always waiting for the lock. When we enable the > > > network card in primary server, the primary server is completely > > > down. Below is > > the primary server log: > > > "Reference Handler" Id=2 WAITING on > java.lang.ref.Reference$Lock@64b6b3fc > > > at java.lang.Object.wait(Native Method) > > > - waiting on java.lang.ref.Reference$Lock@64b6b3fc > > > at java.lang.Object.wait(Object.java:502) > > > at java.lang.ref.Reference.tryHandlePending(Reference.java:191) > > > at > > > java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) > > > > > > > > > > > > ==================================================================== > > > == > > > ========= > > > End Thread dump > > > > > > Is this bugs in Artemis share file configuration? > > > > > > Regards, > > > Rahman > > > > > -- > > Clebert Suconic > > >