> Why was the primary server completely down when it was isolated from the
network?

I can't really say since you've not really provided any details about this.
However, I would guess that since the journal is on NFS and since you
killed the broker's network then it encountered a critical IO error and
shut itself down. This is the expected behavior.

> I configured <network-check-list>, enabled , <network-check-ping-command>
and <network-check-ping6-command> so the primary server knew that the
network was unhealthy as shown in below log...

I've not seen the network pinger enabled for a shared-store configuration
as it was explicitly designed for the replicated (i.e. shared nothing)
configuration to avoid split-brain. In the shared-store configuration the
shared-store itself mitigates against split-brain (e.g. via file locks). I
don't believe you need to configure the network pinger given your use of
shared-store.


Justin

On Mon, Feb 28, 2022 at 11:34 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp]
<rahman.guna...@nasa.gov.invalid> wrote:

> We'll take a look at the NFS configuration.  Why was the primary server
> completely down when it was isolated from the network?  I configured
> <network-check-list>, enabled , <network-check-ping-command> and
> <network-check-ping6-command> so the primary server knew that the network
> was unhealthy as shown in below log:
> [org.apache.activemq.artemis.logs] AMQ201001: Network is unhealthy,
> stopping service ActiveMQServerImpl
>
> However; when we enabled back the network card, the primary server was
> completely down.  I had to start the primary server manually.
>
> Regards,
> Rahman
>
> -----Original Message-----
> From: Justin Bertram <jbert...@apache.org>
> Sent: Monday, February 28, 2022 10:15 AM
> To: users@activemq.apache.org
> Subject: Re: [EXTERNAL] Re: Artemis file locking not released
>
> The backup and the live do have a direct connection. This allows the
> backup to share its connection details with the live. The live then takes
> those details and passes them on to clients so that the clients will know
> where to connect in case the live fails.
>
> However, if this connection breaks it is *not* possible for the backup to
> simply "unlock" the journal and take over. The only entities which can
> unlock the journal is the live broker (who created the lock in the first
> place) or NFS itself (e.g. in the case of some kind of connectivity
> failure). If the lock is not being released when the live broker's NFS
> connectivity fails then I would suggest you have a problem with your NFS
> configuration.
>
>
> Justin
>
> On Mon, Feb 28, 2022 at 6:55 AM Gunawan, Rahman (GSFC-703.H)[Halvik Corp] <
> rahman.guna...@nasa.gov.invalid> wrote:
>
> > The backup server knew that the primary server had problem.  Below is
> > from the log from the backup server:
> > ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to
> > create netty connection: java.net.UnknownHostException
> >
> > Thus, I'm thinking if the Artemis primary server lost connection to
> > NFS or network, the backup server can detect, unlock the file and take
> over.
> > Please let me know if you have suggestions.
> > Thanks
> >
> > Regards,
> > Rahman
> >
> > -----Original Message-----
> > From: Clebert Suconic <clebert.suco...@gmail.com>
> > Sent: Saturday, February 26, 2022 9:27 AM
> > To: users@activemq.apache.org
> > Subject: [EXTERNAL] Re: Artemis file locking not released
> >
> > Could be some configuration on the remote file system attributes ?
> >
> > On Fri, Feb 25, 2022 at 12:03 PM Gunawan, Rahman (GSFC-703.H)[Halvik
> > Corp] <rahman.guna...@nasa.gov.invalid> wrote:
> >
> > > I'm using Artemis 2.19.1.  I'm using share file configuration and
> > > testing a scenario where the primary Artemis server is isolated from
> > > the network by disabling the network card.  Because the primary
> > > server lost communication to NFS, the file is never unlock and the
> > > backup server is always waiting for the lock.  When we enable the
> > > network card in primary server, the primary server is completely
> > > down.  Below is
> > the primary server log:
> > > "Reference Handler" Id=2 WAITING on
> java.lang.ref.Reference$Lock@64b6b3fc
> > >         at java.lang.Object.wait(Native Method)
> > >         -  waiting on java.lang.ref.Reference$Lock@64b6b3fc
> > >         at java.lang.Object.wait(Object.java:502)
> > >         at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> > >         at
> > > java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
> > >
> > >
> > >
> > > ====================================================================
> > > ==
> > > =========
> > > End Thread dump
> > >
> > > Is this bugs in Artemis share file configuration?
> > >
> > > Regards,
> > > Rahman
> > >
> > --
> > Clebert Suconic
> >
>

Reply via email to