> Is there a way to set the lock check interval of org.apache.activemq.artemis.core.server.impl.FileLockNodeManager?
No. > It seems like it checks several times a second according to the debug logs and the source code. It's hard-coded to run every 2 seconds. See here [1]. Where do you see in the source code that it will check "several times a second"? Justin [1] https://github.com/apache/activemq-artemis/blob/bfb408f5905ad9ef5f0375d466894c276b09cca5/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/FileLockNodeManager.java#L60 On Thu, May 1, 2025 at 2:26 PM William Crowell <wcrow...@perforce.com.invalid> wrote: > Justin, > > AWS is using NFSv4.2 (and also tested 4.1). The mount options I am using > on AWS which work fine: > > 1.1.1.2:/data on /data type nfs4 > (rw,relatime,sync,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,soft,noac,proto=tcp,timeo=50,retrans=3,sec=sys,clientaddr=1.1.1.3,lookupcache=none,local_lock=none,addr=1.1.1.2) > > The NFS mount options on environment that I am having the problem with: > > > vers=4.1,defaults,lazytime,noatime,nodiratime,rsize=1048576,wsize=1048576,sync,intr,noac > > Is there a way to set the lock check interval of > org.apache.activemq.artemis.core.server.impl.FileLockNodeManager? It seems > like it checks several times a second according to the debug logs and the > source code. > > This is some configuration issue or bug with NFS or network related issue > as you stated. > > Regards, > > William Crowell > > From: Justin Bertram <jbert...@apache.org> > Date: Thursday, May 1, 2025 at 2:27 PM > To: users@activemq.apache.org <users@activemq.apache.org> > Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On NFSv4 > > Everything works fine on AWS, and I cannot reproduce the issue. > > This part is clear as you explained it previously. What isn't clear is > whether AWS is using NFS and if so what mount options it is using and how > those mount options differ from what you're using in the environment with > the problem. > > It's also not clear exactly what mount options you've tried in the > environment with the problem. > > > The issue is that the backup is taking away the lock from the primary > when it is not supposed to. It’s like the backup host gets impatient and > just grabs the lock. > > I'm not sure this is an accurate way to describe what's happening. Here's > what's happening behind the scenes with the broker... > > The primary and the backup rely on an exclusive file lock from the > underlying filesystem implementation to prevent running at the same time. > When the primary activates it acquires this lock and when the backup > activates it attempts to acquire it as well. Since the primary already has > it the backup can't acquire it. The backup then pauses activation and polls > the lock attempting to acquire it. Once it acquires the lock it completes > its activation process which involves updating the timestamp of the lock > file. Meanwhile the primary who originally acquired the lock monitors the > integrity of the lock to ensure it hasn't lost it. When the backup acquires > the lock updates the timestamp of the lock file the primary notices and > realizes that it has lost the lock and shuts itself down to avoid > split-brain. > > The issue here isn't that the backup got impatient and stole the lock from > the primary. The issue is that the file-system allowed the backup to > acquire the lock, and that reason is not clear. Perhaps there was some kind > of network failure between the primary and the NFS mount that caused the > lock to fail. Perhaps there's an NFS misconfiguration or even an NFS bug > that caused it. Perhaps the issue is in a virtualization layer (assuming > that applies). Perhaps it's because all the relevant clocks aren't > synchronized. > > > Sent you the logs directly. > > Thanks for that. I reviewed the logs and the broker appears to be working > as designed. There's no smoking gun. For some reason the filesystem simply > allows the backup to acquire the lock after a few minutes of denying it. > > At this point I don't see any evidence that the broker could be configured > any differently to prevent this issue. > > > Justin > > On Thu, May 1, 2025 at 12:04 PM William Crowell > <wcrow...@perforce.com.invalid> wrote: > > > Justin, > > > > Everything works fine on AWS, and I cannot reproduce the issue. I am > > working with someone who has on-premise hosts. > > > > The issue is that the backup is taking away the lock from the primary > when > > it is not supposed to. It’s like the backup host gets impatient and just > > grabs the lock. Sent you the logs directly. > > > > All options are valid in that link except for intr which was removed. I > > am verifying the Linux kernel build version, but it should be very > recent. > > > > Regards, > > > > William Crowell > > > > From: Justin Bertram <jbert...@apache.org> > > Date: Thursday, May 1, 2025 at 12:57 PM > > To: users@activemq.apache.org <users@activemq.apache.org> > > Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On > NFSv4 > > > Those option listed in the link regarding NFS that will work is sync. > > > > I'm not following you here. Which exact options are you referring to? > > > > > intr was removed and ignored in later versions of the Linux kernel. > > > > That's correct. It remains in the recommended settings because we don't > > know what version of the Linux kernel folks are using. > > > > > I did try mounting the 2 brokers are mounting using these options (on > > AWS)...I was not able to reproduce the issue on AWS. > > > > I'm lost here. Can you clarify exactly what is and isn't working on AWS > and > > in your problematic environment? > > > > > > Justin > > > > On Thu, May 1, 2025 at 11:36 AM William Crowell > > <wcrow...@perforce.com.invalid> wrote: > > > > > Justin, > > > > > > Yes, already ahead of you on this. > > > > > > We are also checking if the clocks are in sync between the brokers and > > the > > > NFS share. I know you can have some drift in clock between the > servers, > > > but it cannot be very large. > > > > > > Those option listed in the link regarding NFS that will work is sync. > > > intr was removed and ignored in later versions of the Linux kernel. > > > > > > I did try mounting the 2 brokers are mounting using these options (on > > AWS): > > > > > > > > > sudo mount -o > > > vers=4.1,soft,sync,intr,noac,lookupcache=none,timeo=50,retrans=3 -t > nfs > > > 1.1.1.4:/data /data > > > > > > I was not able to reproduce the issue on AWS. > > > > > > If you want the full logs, then I can upload those to gist or > something. > > > Working on this. > > > > > > This is what I am seeing on the other system having the problem. Here > is > > > the primary: > > > > > > … > > > 2025-05-01 15:07:15,441 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock > > > appears to be valid; double check by reading status > > > 2025-05-01 15:07:15,441 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > > getting > > > state... > > > 2025-05-01 15:07:15,441 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > trying > > > to lock position: 0 > > > 2025-05-01 15:07:15,441 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > locked > > > position: 0 > > > 2025-05-01 15:07:15,441 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > lock: > > > sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid] > > > 2025-05-01 15:07:15,442 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > > state: L > > > (THIS MEANS ACTIVE) > > > 2025-05-01 15:07:15,442 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock > > > appears to be valid; triple check by comparing timestamp > > > 2025-05-01 15:07:15,444 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock > > > file /data/server.lock originally locked at > 2025-05-01T15:06:16.787+0000 > > > was modified at 2025-05-01T15:07:05.184+0000 > > > 2025-05-01 15:07:15,445 WARN > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lost > > the > > > lock according to the monitor, notifying listeners > > > 2025-05-01 15:07:15,445 ERROR [org.apache.activemq.artemis.core.server] > > > AMQ222010: Critical IO Error, shutting down the server. file=Lost > > > NodeManager lock, message=NULL > > > java.io.IOException: lost lock > > > … > > > > > > On the backup it tries once a second to get the lock and can’t which is > > > good, but something finally happens at 3:07:05pm, and it mistakenly > > thinks > > > it can get the lock: > > > > > > … > > > 2025-05-01 15:07:05,183 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > locked > > > position: 0 > > > 2025-05-01 15:07:05,183 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > lock: > > > sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid] > > > 2025-05-01 15:07:05,183 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > > state: L > > > (THIS MEANS ACTIVE) > > > 2025-05-01 15:07:05,184 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > > acquired > > > primary node lock state = L > > > 2025-05-01 15:07:05,186 DEBUG > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > > touched > > > /data/server.lock; new time: 1746112025184 > > > … > > > > > > Regards, > > > > > > William Crowell > > > > > > From: Justin Bertram <jbert...@apache.org> > > > Date: Thursday, May 1, 2025 at 11:44 AM > > > To: users@activemq.apache.org <users@activemq.apache.org> > > > Subject: Re: Apache Artemis 2.40.0: Strange File Locking Behavior On > > NFSv4 > > > Would you mind turning on DEBUG logging for > > > org.apache.activemq.artemis.core.server.impl.FileLockNodeManager [1], > > > reproducing, and uploading the full logs someplace accessible? > > > > > > Regarding your NFS mount options...I was perusing the NFS man page [2] > > and > > > saw this note about noatime & nodiratime: > > > > > > In particular, the atime/noatime, diratime/nodiratime, > > > relatime/norelatime, and strictatime/nostrictatime mount options have > no > > > effect on NFS mounts. > > > > > > I could not find any reference to lazytime, and I also noticed that you > > > weren't using all the recommendations from the ActiveMQ Artemis > > > documentation. I wonder if you might try something like this: > > > > > > vers=4.1,soft,sync,intr,noac,lookupcache=none,timeo=50,retrans=3 > > > > > > Is NFS being used in your AWS use-case? If so, what mount options are > > being > > > used? > > > > > > To be clear, I'm not an NFS expert by any means. Usually this stuff > just > > > works with the recommended settings. > > > > > > > > > Justin > > > > > > [1] > > > > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394248230%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=9jgPvW9IQ3tuuUbqkS8NfpkQPaGMcyF3vc%2BwyN%2FQW5I%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394281322%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=48fvH5SbLnmKB%2FssyYUTzm8dF4jy0pzkJjdu8JFC%2BpI%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger > > > > > > > > < > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394298696%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=R58mmMBd2x%2Bk0Ki47i87U%2FKKFhBG8Eluh3jGyAhtGng%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Flogging.html%23configuring-a-specific-level-for-a-logger&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394315439%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=U4ZGfHpNVSPsXfJ448mqjHkjGY94c5Isnc2yxrDk984%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/logging.html#configuring-a-specific-level-for-a-logger > > > > > > > > > > > > [2] > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394331789%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=pVxCfd0L%2BGcT%2B9bJQmBueUlKR9Uo9ltJbHvlQldQs7c%3D&reserved=0 > <https://www.man7.org/linux/man-pages/man5/nfs.5.html> > > < > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394348725%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XDRT%2BEhzw0y7b4LpyXCmKzVa38fOCcfJhzcolB18mbI%3D&reserved=0 > <https://www.man7.org/linux/man-pages/man5/nfs.5.html>> > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394366667%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=LSoz3pLKYdl917Xf9o95DV6QxxvcHRwhaSbK2vc5%2BhM%3D&reserved=0 > <https://www.man7.org/linux/man-pages/man5/nfs.5.html> > > < > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman5%2Fnfs.5.html&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394383244%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vKuNDBBYV0f2OjhnzsBdgVfQWJ2O7ieNTmCL%2BC%2FEBl4%3D&reserved=0 > <https://www.man7.org/linux/man-pages/man5/nfs.5.html>>> > > > > > > On Thu, May 1, 2025 at 9:02 AM William Crowell > > > <wcrow...@perforce.com.invalid> wrote: > > > > > > > Configuration and logs… > > > > > > > > The relevant broker.xml configuration from the primary (broker 1) is: > > > > > > > > … > > > > <connectors> > > > > <connector name="broker1">tcp://1.1.1.2:61616</connector> > > > > <connector name="broker2">tcp://1.1.1.3:61616</connector> > > > > </connectors> > > > > > > > > <cluster-connections> > > > > <cluster-connection name="my-cluster"> > > > > <connector-ref>broker1</connector-ref> > > > > <static-connectors> > > > > <connector-ref>broker2</connector-ref> > > > > </static-connectors> > > > > </cluster-connection> > > > > </cluster-connections> > > > > > > > > <ha-policy> > > > > <shared-store> > > > > <primary> > > > > <failover-on-shutdown>true</failover-on-shutdown> > > > > </primary> > > > > </shared-store> > > > > </ha-policy> > > > > … > > > > > > > > The relevant broker.xml configuration from the backup (broker 2) is: > > > > > > > > … > > > > <connectors> > > > > <connector name="broker1">tcp://1.1.1.2:61616</connector> > > > > <connector name="broker2">tcp://1.1.1.3:61616</connector> > > > > </connectors> > > > > > > > > <cluster-connections> > > > > <cluster-connection name="my-cluster"> > > > > <connector-ref>broker2</connector-ref> > > > > <static-connectors> > > > > <connector-ref>broker1</connector-ref> > > > > </static-connectors> > > > > </cluster-connection> > > > > </cluster-connections> > > > > > > > > <ha-policy> > > > > <shared-store> > > > > <backup> > > > > <allow-failback>false</allow-failback> > > > > </backup> > > > > </shared-store> > > > > </ha-policy> > > > > … > > > > > > > > Startup on the primary: > > > > > > > > … > > > > 2025-05-01 12:51:23,567 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221006: Waiting to obtain primary lock > > > > … > > > > 2025-05-01 12:51:23,747 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221034: Waiting indefinitely to obtain primary lock > > > > 2025-05-01 12:51:23,748 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221035: Primary Server Obtained primary lock > > > > … > > > > 2025-05-01 12:51:24,289 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221007: Server is now active > > > > … > > > > > > > > Backup is started and somehow becomes primary: > > > > > > > > … > > > > 2025-05-01 12:51:48,473 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221032: Waiting to become backup node > > > > 2025-05-01 12:51:48,474 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221033: ** got backup lock > > > > … > > > > 2025-05-01 12:51:48,659 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221109: Apache ActiveMQ Artemis Backup Server version 2.40.0 > > > > [339308e1-25f3-11f0-996a-0200ec1b9c8e] started; waiting for primary > to > > > fail > > > > before activating > > > > 2025-05-01 12:51:48,809 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221031: backup announced > > > > … > > > > 2025-05-01 12:52:06,129 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221010: Backup Server is now active > > > > … > > > > 2025-05-01 12:52:07,158 INFO > [org.apache.activemq.artemis.core.client] > > > > AMQ214036: Connection closure to 1.1.1.2/1.1.1.2:61616 has been > > > detected: > > > > AMQ219015: The connection was disconnected because of server shutdown > > > > [code=DISCONNECTED] > > > > … > > > > > > > > Primary loses the lock unexpectedly and shuts down: > > > > > > > > … > > > > 2025-05-01 12:52:16,352 WARN > > > > [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] > Lost > > > the > > > > lock according to the monitor, notifying listeners > > > > 2025-05-01 12:52:16,353 ERROR > [org.apache.activemq.artemis.core.server] > > > > AMQ222010: Critical IO Error, shutting down the server. file=Lost > > > > NodeManager lock, message=NULL > > > > java.io.IOException: lost lock > > > > at > > > > > > > > > > org.apache.activemq.artemis.core.server.impl.SharedStorePrimaryActivation.lambda$registerActiveLockListener$0(SharedStorePrimaryActivation.java:124) > > > > ~[artemis-server-2.40.0.jar:2.40.0] > > > > at > > > > > > > > > > org.apache.activemq.artemis.core.server.NodeManager.lambda$notifyLostLock$0(NodeManager.java:167) > > > > ~[artemis-server-2.40.0.jar:2.40.0] > > > > … > > > > 2025-05-01 12:52:16,528 INFO > [org.apache.activemq.artemis.core.server] > > > > AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.40.0 > > > > [339308e1-25f3-11f0-996a-0200ec1b9c8e] stopped, uptime 52.978 seconds > > > > … > > > > > > > > Regards, > > > > > > > > William Crowell > > > > > > > > From: William Crowell <wcrow...@perforce.com.INVALID> > > > > Date: Thursday, May 1, 2025 at 9:20 AM > > > > To: users@activemq.apache.org <users@activemq.apache.org> > > > > Subject: Apache Artemis 2.40.0: Strange File Locking Behavior On > NFSv4 > > > > Good morning, > > > > > > > > Disclaimer: This is not a bug, but a configuration issue. > > > > > > > > We are using Apache Artemis 2.40.0 on Rocky Linux 9. We are > > configuring > > > a > > > > primary/backup pair on separate hosts and putting the data directory > on > > > an > > > > NSFv4 mount, and we are experiencing problems with the locking > > mechanism. > > > > I do know that NFS is not recommended for production use, but that is > > > what > > > > we are limited to. > > > > > > > > We are following this documentation: > > > > > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394399218%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XJVBL%2Fxxal5A8KFWuStcgaTt4EyMwp5dDhw9NTXRo%2F0%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394412442%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=KQjHVR%2F0Gj2urG916oXvVuOEh4aEQIrla6OjfMI2tK4%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > > > > > < > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394424413%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SombW%2FC5wBMSEFVgiSHNu4Nl7HIm0o0GT8%2BYSCj7N%2BE%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394437513%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ululax1fprmjZL4ABNIh9KTDGA2boKX1LC%2BLEx8Ty%2Bk%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > > > > > > > > > > < > > > > > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394452178%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Z23U8JOGGLafbxiqw80ymkmG7EYh4pKadj4HyfURKgU%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394467087%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2FMRL4hIe8r7lSUeaQEMK8dNBILQ1haqc7xBrsHvmDSc%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > > > > > < > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394482811%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=OjPvQzD9SE3x4K5NCy6hY5ytYzbZqwlUhdZgwGaA3ag%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > < > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Factivemq.apache.org%2Fcomponents%2Fartemis%2Fdocumentation%2Flatest%2Fha.html%23nfs-mount-recommendations&data=05%7C02%7CWCrowell%40perforce.com%7C15555471b7064329015608dd88ddcab2%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C638817208394497400%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=09SPXjmedKNdokWDMFIxtCAIYmpuvmgycChSpxEM5Gs%3D&reserved=0 > < > https://activemq.apache.org/components/artemis/documentation/latest/ha.html#nfs-mount-recommendations > > > > > > > > > > > > > > > > > > > > > > What is happening is that the primary loses the lock and goes down > > after > > > > the backup node was started. The mount options on both brokers we > are > > > > using are: > > > > > > > > > > > > > > > > > > vers=4.1,defaults,lazytime,noatime,nodiratime,rsize=1048576,wsize=1048576,sync,intr,noac > > > > > > > > We then tried to startup the nodes sequentially. The primary lost > the > > > > lock and went down shortly after the backup node was started. > > > > > > > > We also tested file locking on brokers 1 and 2: > > > > > > > > Broker 1: > > > > > > > > $ date; flock -x /data/test.lock -c "sleep 30"; echo $?; date > > > > Thu May 1 12:42:52 PM GMT 2025 > > > > 0 > > > > Thu May 1 12:43:22 PM GMT 2025 > > > > > > > > Broker 2: > > > > > > > > date; flock -n /data/test.lock -c "echo lock acquired"; echo $?; > date > > > > Thu May 1 12:42:46 PM GMT 2025 > > > > 1 > > > > Thu May 1 12:42:47 PM GMT 2025 > > > > > > > > This means that broker 2 was unable to acquire the lock because > broker > > 1 > > > > already had it which is not consistent with the behavior on the > Apache > > > > Artemis brokers. I also tested this on AWS, and the failover works > > fine > > > as > > > > expected. > > > > > > > > What am I missing here? > > > > > > > > Regards, > > > > > > > > William Crowell > > > > > > > > > > > > > > > > This e-mail may contain information that is privileged or > confidential. > > > If > > > > you are not the intended recipient, please delete the e-mail and any > > > > attachments and notify us immediately. > > > > > > > > > > > > This e-mail may contain information that is privileged or > confidential. > > > If > > > > you are not the intended recipient, please delete the e-mail and any > > > > attachments and notify us immediately. > > > > > > > > > > > > > > > > > CAUTION: This email originated from outside of the organization. Do not > > > click on links or open attachments unless you recognize the sender and > > know > > > the content is safe. > > > > > > > > > This e-mail may contain information that is privileged or confidential. > > If > > > you are not the intended recipient, please delete the e-mail and any > > > attachments and notify us immediately. > > > > > > > > > > > > CAUTION: This email originated from outside of the organization. Do not > > click on links or open attachments unless you recognize the sender and > know > > the content is safe. > > > > > > This e-mail may contain information that is privileged or confidential. > If > > you are not the intended recipient, please delete the e-mail and any > > attachments and notify us immediately. > > > > > > > CAUTION: This email originated from outside of the organization. Do not > click on links or open attachments unless you recognize the sender and know > the content is safe. > > > This e-mail may contain information that is privileged or confidential. If > you are not the intended recipient, please delete the e-mail and any > attachments and notify us immediately. > >