Justin,

AWS is using NFSv4.2 (and also tested 4.1).  The mount options I am using on 
AWS which work fine:

1.1.1.2:/data on /data type nfs4 
(rw,relatime,sync,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=50,retrans=3,sec=sys,clientaddr=1.1.1.3,lookupcache=none,local_lock=none,addr=1.1.1.2)

The NFS mount options on environment that I am having the problem with:

vers=4.1,defaults,lazytime,noatime,nodiratime,rsize=1048576,wsize=1048576,sync,intr,noac

Is there a way to set the lock check interval of 
org.apache.activemq.artemis.core.server.impl.FileLockNodeManager?  It seems 
like it checks several times a second according to the debug logs and the 
source code.

This is some configuration issue or bug with NFS or network related issue as 
you stated.

Regards,

William Crowell

From: Justin Bertram <jbert...@apache.org>
Date: Thursday, May 1, 2025 at 2:27 PM
To: users@activemq.apache.org <users@activemq.apache.org>
Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On NFSv4
> Everything works fine on AWS, and I cannot reproduce the issue.

This part is clear as you explained it previously. What isn't clear is
whether AWS is using NFS and if so what mount options it is using and how
those mount options differ from what you're using in the environment with
the problem.

It's also not clear exactly what mount options you've tried in the
environment with the problem.

> The issue is that the backup is taking away the lock from the primary
when it is not supposed to.  It’s like the backup host gets impatient and
just grabs the lock.

I'm not sure this is an accurate way to describe what's happening. Here's
what's happening behind the scenes with the broker...

The primary and the backup rely on an exclusive file lock from the
underlying filesystem implementation to prevent running at the same time.
When the primary activates it acquires this lock and when the backup
activates it attempts to acquire it as well. Since the primary already has
it the backup can't acquire it. The backup then pauses activation and polls
the lock attempting to acquire it. Once it acquires the lock it completes
its activation process which involves updating the timestamp of the lock
file. Meanwhile the primary who originally acquired the lock monitors the
integrity of the lock to ensure it hasn't lost it. When the backup acquires
the lock updates the timestamp of the lock file the primary notices and
realizes that it has lost the lock and shuts itself down to avoid
split-brain.

The issue here isn't that the backup got impatient and stole the lock from
the primary. The issue is that the file-system allowed the backup to
acquire the lock, and that reason is not clear. Perhaps there was some kind
of network failure between the primary and the NFS mount that caused the
lock to fail. Perhaps there's an NFS misconfiguration or even an NFS bug
that caused it. Perhaps the issue is in a virtualization layer (assuming
that applies). Perhaps it's because all the relevant clocks aren't
synchronized.

> Sent you the logs directly.

Thanks for that. I reviewed the logs and the broker appears to be working
as designed. There's no smoking gun. For some reason the filesystem simply
allows the backup to acquire the lock after a few minutes of denying it.

At this point I don't see any evidence that the broker could be configured
any differently to prevent this issue.


Justin

On Thu, May 1, 2025 at 12:04 PM William Crowell
<wcrow...@perforce.com.invalid> wrote:

> Justin,
>
> Everything works fine on AWS, and I cannot reproduce the issue.  I am
> working with someone who has on-premise hosts.
>
> The issue is that the backup is taking away the lock from the primary when
> it is not supposed to.  It’s like the backup host gets impatient and just
> grabs the lock.  Sent you the logs directly.
>
> All options are valid in that link except for intr which was removed.  I
> am verifying the Linux kernel build version, but it should be very recent.
>
> Regards,
>
> William Crowell
>
> From: Justin Bertram <jbert...@apache.org>
> Date: Thursday, May 1, 2025 at 12:57 PM
> To: users@activemq.apache.org <users@activemq.apache.org>
> Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On NFSv4
> > Those option listed in the link regarding NFS that will work is sync.
>
> I'm not following you here. Which exact options are you referring to?
>
> > intr was removed and ignored in later versions of the Linux kernel.
>
> That's correct. It remains in the recommended settings because we don't
> know what version of the Linux kernel folks are using.
>
> > I did try mounting the 2 brokers are mounting using these options (on
> AWS)...I was not able to reproduce the issue on AWS.
>
> I'm lost here. Can you clarify exactly what is and isn't working on AWS and
> in your problematic environment?
>
>
> Justin
>
> On Thu, May 1, 2025 at 11:36 AM William Crowell
> <wcrow...@perforce.com.invalid> wrote:
>
> > Justin,
> >
> > Yes, already ahead of you on this.
> >
> > We are also checking if the clocks are in sync between the brokers and
> the
> > NFS share.  I know you can have some drift in clock between the servers,
> > but it cannot be very large.
> >
> > Those option listed in the link regarding NFS that will work is sync.
> > intr was removed and ignored in later versions of the Linux kernel.
> >
> > I did try mounting the 2 brokers are mounting using these options (on
> AWS):
> >
> >
> > sudo mount -o
> > vers=4.1,soft,sync,intr,noac,lookupcache=none,timeo=50,retrans=3  -t nfs
> > 1.1.1.4:/data /data
> >
> > I was not able to reproduce the issue on AWS.
> >
> > If you want the full logs, then I can upload those to gist or something.
> > Working on this.
> >
> > This is what I am seeing on the other system having the problem.  Here is
> > the primary:
> >
> > …
> > 2025-05-01 15:07:15,441 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock
> > appears to be valid; double check by reading status
> > 2025-05-01 15:07:15,441 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager]
> getting
> > state...
> > 2025-05-01 15:07:15,441 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] trying
> > to lock position: 0
> > 2025-05-01 15:07:15,441 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] locked
> > position: 0
> > 2025-05-01 15:07:15,441 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] lock:
> > sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid]
> > 2025-05-01 15:07:15,442 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager]
> state: L
> > (THIS MEANS ACTIVE)
> > 2025-05-01 15:07:15,442 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock
> > appears to be valid; triple check by comparing timestamp
> > 2025-05-01 15:07:15,444 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock
> > file /data/server.lock originally locked at 2025-05-01T15:06:16.787+0000
> > was modified at 2025-05-01T15:07:05.184+0000
> > 2025-05-01 15:07:15,445 WARN
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lost
> the
> > lock according to the monitor, notifying listeners
> > 2025-05-01 15:07:15,445 ERROR [org.apache.activemq.artemis.core.server]
> > AMQ222010: Critical IO Error, shutting down the server. file=Lost
> > NodeManager lock, message=NULL
> > java.io.IOException: lost lock
> > …
> >
> > On the backup it tries once a second to get the lock and can’t which is
> > good, but something finally happens at 3:07:05pm, and it mistakenly
> thinks
> > it can get the lock:
> >
> > …
> > 2025-05-01 15:07:05,183 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] locked
> > position: 0
> > 2025-05-01 15:07:05,183 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] lock:
> > sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid]
> > 2025-05-01 15:07:05,183 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager]
> state: L
> > (THIS MEANS ACTIVE)
> > 2025-05-01 15:07:05,184 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager]
> acquired
> > primary node lock state = L
> > 2025-05-01 15:07:05,186 DEBUG
> > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager]
> touched
> > /data/server.lock; new time: 1746112025184
> > …
> >
> > Regards,
> >
> > William Crowell
> >
> > From: Justin Bertram <jbert...@apache.org>
> > Date: Thursday, May 1, 2025 at 11:44 AM
> > To: users@activemq.apache.org <users@activemq.apache.org>
> > Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On
> NFSv4
> > Would you mind turning on DEBUG logging for
> > org.apache.activemq.artemis.core.server.impl.FileLockNodeManager [1],
> > reproducing, and uploading the full logs someplace accessible?
> >
> > Regarding your NFS mount options...I was perusing the NFS man page [2]
> and
> > saw this note about noatime & nodiratime:
> >
> >     In particular, the atime/noatime, diratime/nodiratime,
> > relatime/norelatime, and strictatime/nostrictatime mount options have no
> > effect on NFS mounts.
> >
> > I could not find any reference to lazytime, and I also noticed that you
> > weren't using all the recommendations from the ActiveMQ Artemis
> > documentation. I wonder if you might try something like this:
> >
> >    vers=4.1,soft,sync,intr,noac,lookupcache=none,timeo=50,retrans=3
> >
> > Is NFS being used in your AWS use-case? If so, what mount options are
> being
> > used?
> >
> > To be clear, I'm not an NFS expert by any means. Usually this stuff just
> > works with the recommended settings.
> >
> >
> > Justin
> >
> > [1]
> >
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394248230%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=9jgPvW9IQ3tuuUbqkS8NfpkQPaGMcyF3vc%2BwyN%2FQW5I%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394281322%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=48fvH5SbLnmKB%2FssyYUTzm8dF4jy0pzkJjdu8JFC%2BpI%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger>
> >
> > <
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394298696%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=R58mmMBd2x%2Bk0Ki47i87U%2FKKFhBG8Eluh3jGyAhtGng%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394315439%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=U4ZGfHpNVSPsXfJ448mqjHkjGY94c5Isnc2yxrDk984%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger>
> >
> > >
> > [2]
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394331789%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pVxCfd0L%2BGcT%2B9bJQmBueUlKR9Uo9ltJbHvlQldQs7c%3D&reserved=0<https://www.man7.org/linux/man-pages/man5/nfs.5.html>
> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394348725%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XDRT%2BEhzw0y7b4LpyXCmKzVa38fOCcfJhzcolB18mbI%3D&reserved=0<https://www.man7.org/linux/man-pages/man5/nfs.5.html>>
> > <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394366667%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=LSoz3pLKYdl917Xf9o95DV6QxxvcHRwhaSbK2vc5%2BhM%3D&reserved=0<https://www.man7.org/linux/man-pages/man5/nfs.5.html>
> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394383244%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vKuNDBBYV0f2OjhnzsBdgVfQWJ2O7ieNTmCL%2BC%2FEBl4%3D&reserved=0<https://www.man7.org/linux/man-pages/man5/nfs.5.html>>>
> >
> > On Thu, May 1, 2025 at 9:02 AM William Crowell
> > <wcrow...@perforce.com.invalid> wrote:
> >
> > > Configuration and logs…
> > >
> > > The relevant broker.xml configuration from the primary (broker 1) is:
> > >
> > > …
> > >       <connectors>
> > >          <connector name="broker1">tcp://1.1.1.2:61616</connector>
> > >          <connector name="broker2">tcp://1.1.1.3:61616</connector>
> > >       </connectors>
> > >
> > >       <cluster-connections>
> > >          <cluster-connection name="my-cluster">
> > >             <connector-ref>broker1</connector-ref>
> > >             <static-connectors>
> > >                <connector-ref>broker2</connector-ref>
> > >             </static-connectors>
> > >          </cluster-connection>
> > >       </cluster-connections>
> > >
> > >       <ha-policy>
> > >          <shared-store>
> > >             <primary>
> > >                <failover-on-shutdown>true</failover-on-shutdown>
> > >             </primary>
> > >          </shared-store>
> > >       </ha-policy>
> > > …
> > >
> > > The relevant broker.xml configuration from the backup (broker 2) is:
> > >
> > > …
> > >       <connectors>
> > >          <connector name="broker1">tcp://1.1.1.2:61616</connector>
> > >          <connector name="broker2">tcp://1.1.1.3:61616</connector>
> > >       </connectors>
> > >
> > >       <cluster-connections>
> > >          <cluster-connection name="my-cluster">
> > >             <connector-ref>broker2</connector-ref>
> > >             <static-connectors>
> > >                <connector-ref>broker1</connector-ref>
> > >             </static-connectors>
> > >          </cluster-connection>
> > >       </cluster-connections>
> > >
> > >       <ha-policy>
> > >          <shared-store>
> > >             <backup>
> > >                <allow-failback>false</allow-failback>
> > >             </backup>
> > >          </shared-store>
> > >       </ha-policy>
> > > …
> > >
> > > Startup on the primary:
> > >
> > > …
> > > 2025-05-01 12:51:23,567 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221006: Waiting to obtain primary lock
> > > …
> > > 2025-05-01 12:51:23,747 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221034: Waiting indefinitely to obtain primary lock
> > > 2025-05-01 12:51:23,748 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221035: Primary Server Obtained primary lock
> > > …
> > > 2025-05-01 12:51:24,289 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221007: Server is now active
> > > …
> > >
> > > Backup is started and somehow becomes primary:
> > >
> > > …
> > > 2025-05-01 12:51:48,473 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221032: Waiting to become backup node
> > > 2025-05-01 12:51:48,474 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221033: ** got backup lock
> > > …
> > > 2025-05-01 12:51:48,659 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.40.0
> > > [339308e1-25f3-11f0-996a-0200ec1b9c8e] started; waiting for primary to
> > fail
> > > before activating
> > > 2025-05-01 12:51:48,809 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221031: backup announced
> > > …
> > > 2025-05-01 12:52:06,129 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221010: Backup Server is now active
> > > …
> > > 2025-05-01 12:52:07,158 INFO  [org.apache.activemq.artemis.core.client]
> > > AMQ214036: Connection closure to 1.1.1.2/1.1.1.2:61616 has been
> > detected:
> > > AMQ219015: The connection was disconnected because of server shutdown
> > > [code=DISCONNECTED]
> > > …
> > >
> > > Primary loses the lock unexpectedly and shuts down:
> > >
> > > …
> > > 2025-05-01 12:52:16,352 WARN
> > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lost
> > the
> > > lock according to the monitor, notifying listeners
> > > 2025-05-01 12:52:16,353 ERROR [org.apache.activemq.artemis.core.server]
> > > AMQ222010: Critical IO Error, shutting down the server. file=Lost
> > > NodeManager lock, message=NULL
> > > java.io.IOException: lost lock
> > >         at
> > >
> >
> org.apache.activemq.artemis.core.server.impl.SharedStorePrimaryActivation.lambda$registerActiveLockListener$0(SharedStorePrimaryActivation.java:124)
> > > ~[artemis-server-2.40.0.jar:2.40.0]
> > >         at
> > >
> >
> org.apache.activemq.artemis.core.server.NodeManager.lambda$notifyLostLock$0(NodeManager.java:167)
> > > ~[artemis-server-2.40.0.jar:2.40.0]
> > > …
> > > 2025-05-01 12:52:16,528 INFO  [org.apache.activemq.artemis.core.server]
> > > AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.40.0
> > > [339308e1-25f3-11f0-996a-0200ec1b9c8e] stopped, uptime 52.978 seconds
> > > …
> > >
> > > Regards,
> > >
> > > William Crowell
> > >
> > > From: William Crowell <wcrow...@perforce.com.INVALID>
> > > Date: Thursday, May 1, 2025 at 9:20 AM
> > > To: users@activemq.apache.org <users@activemq.apache.org>
> > > Subject: Apache Artemis 2.40.0: Strange File Locking Behavior On NFSv4
> > > Good morning,
> > >
> > > Disclaimer: This is not a bug, but a configuration issue.
> > >
> > > We are using Apache Artemis 2.40.0 on Rocky Linux 9.  We are
> configuring
> > a
> > > primary/backup pair on separate hosts and putting the data directory on
> > an
> > > NSFv4 mount, and we are experiencing problems with the locking
> mechanism.
> > > I do know that NFS is not recommended for production use, but that is
> > what
> > > we are limited to.
> > >
> > > We are following this documentation:
> > >
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394399218%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XJVBL%2Fxxal5A8KFWuStcgaTt4EyMwp5dDhw9NTXRo%2F0%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394412442%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=KQjHVR%2F0Gj2urG916oXvVuOEh4aEQIrla6OjfMI2tK4%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> >
> > <
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394424413%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SombW%2FC5wBMSEFVgiSHNu4Nl7HIm0o0GT8%2BYSCj7N%2BE%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394437513%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ululax1fprmjZL4ABNIh9KTDGA2boKX1LC%2BLEx8Ty%2Bk%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> >
> > >
> > > <
> > >
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394452178%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Z23U8JOGGLafbxiqw80ymkmG7EYh4pKadj4HyfURKgU%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394467087%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2FMRL4hIe8r7lSUeaQEMK8dNBILQ1haqc7xBrsHvmDSc%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> >
> > <
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394482811%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=OjPvQzD9SE3x4K5NCy6hY5ytYzbZqwlUhdZgwGaA3ag%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> <
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394497400%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=09SPXjmedKNdokWDMFIxtCAIYmpuvmgycChSpxEM5Gs%3D&reserved=0<https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations>
> >
> > >
> > > >
> > >
> > > What is happening is that the primary loses the lock and goes down
> after
> > > the backup node was started.  The mount options on both brokers we are
> > > using are:
> > >
> > >
> > >
> >
> vers=4.1,defaults,lazytime,noatime,nodiratime,rsize=1048576,wsize=1048576,sync,intr,noac
> > >
> > > We then tried to startup the nodes sequentially.  The primary lost the
> > > lock and went down shortly after the backup node was started.
> > >
> > > We also tested file locking on brokers 1 and 2:
> > >
> > > Broker 1:
> > >
> > > $ date; flock -x /data/test.lock  -c "sleep 30"; echo $?; date
> > > Thu May  1 12:42:52 PM GMT 2025
> > > 0
> > > Thu May  1 12:43:22 PM GMT 2025
> > >
> > > Broker 2:
> > >
> > > date; flock -n /data/test.lock  -c "echo lock acquired"; echo $?; date
> > > Thu May  1 12:42:46 PM GMT 2025
> > > 1
> > > Thu May  1 12:42:47 PM GMT 2025
> > >
> > > This means that broker 2 was unable to acquire the lock because broker
> 1
> > > already had it which is not consistent with the behavior on the Apache
> > > Artemis brokers.  I also tested this on AWS, and the failover works
> fine
> > as
> > > expected.
> > >
> > > What am I missing here?
> > >
> > > Regards,
> > >
> > > William Crowell
> > >
> > >
> > >
> > > This e-mail may contain information that is privileged or confidential.
> > If
> > > you are not the intended recipient, please delete the e-mail and any
> > > attachments and notify us immediately.
> > >
> > >
> > > This e-mail may contain information that is privileged or confidential.
> > If
> > > you are not the intended recipient, please delete the e-mail and any
> > > attachments and notify us immediately.
> > >
> > >
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click on links or open attachments unless you recognize the sender and
> know
> > the content is safe.
> >
> >
> > This e-mail may contain information that is privileged or confidential.
> If
> > you are not the intended recipient, please delete the e-mail and any
> > attachments and notify us immediately.
> >
> >
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click on links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>


CAUTION: This email originated from outside of the organization. Do not click 
on links or open attachments unless you recognize the sender and know the 
content is safe.


This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.

Reply via email to