An NFS problem was the first thing I thought of when I saw out-of-order log lines, especially since you've had that problem before. And this outage lasted for over two minutes (which doesn't count as "slow" in my book; that's "unavailable" or "down" to me), which is pretty crazy; hopefully your ops team has looked into how that happened and taken steps to ensure it doesn't happen again.
A NFS outage does justify a failover to the backup broker; to understand why, think about what prevents failover during normal operation. The master broker holds a file system lock on a DB lock file, and the slave broker tries repeatedly to acquire the same lock. As long as it can't, it knows the master broker is up and it can't become the master; at the point where the lock disappears because the master broker can't access NFS, the slave becomes active (at least, if it can access NFS; if not, then it doesn't know that it could become active and it can't read the messages from disk anyway). This is exactly what you would want to happen. The real problem here is the one in your last paragraph: when the slave acquires the lock because the master can't access NFS, the master isn't detecting that and becoming the slave. I'd suggest you try to recreate this failure (in a dev environment) by causing the master broker to be unable to access NFS and confirming that the master remains active even after the slave becomes the master. Assuming that happens, submit a JIRA bug report to describe the problem. Make sure you provide lots of details about your NFS setup (include version numbers, file system type, etc.) and about the O/Ses of the machines the brokers run on, since the behavior might vary based on some of those things and you want to make sure that whoever investigates this can reproduce it. But make sure you can reproduce it first. Tim Hi, I got the logs in this order only and after further checking the system I got to know that NFS(where we put kahadb and broker logs) was slow during that time. I can understand the delay in logs or I/O operations are slow during that time but it does not justify why failover also open it's transport connector. The main concern here is that the (master-slave-shared-storage)topology is broken which should not happen in any case. If I/O operations are not happening, master broker should stop and let the failover serve the clients but here master didn't stop and both opened the connector. Thanks, Anuj -- View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-master-slave-topology-issue-BUG-tp4695677p4695731.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.