Re: ActiveMQ master-slave topology issue[BUG]

Tim Bain Thu, 30 Apr 2015 05:43:12 -0700

An NFS problem was the first thing I thought of when I saw out-of-order log
lines, especially since you've had that problem before.  And this outage
lasted for over two minutes (which doesn't count as "slow" in my book;
that's "unavailable" or "down" to me), which is pretty crazy; hopefully
your ops team has looked into how that happened and taken steps to ensure
it doesn't happen again.

A NFS outage does justify a failover to the backup broker; to understand
why, think about what prevents failover during normal operation. The
master broker holds a file system lock on a DB lock file, and the slave
broker tries repeatedly to acquire the same lock. As long as it can't, it
knows the master broker is up and it can't become the master; at the point
where the lock disappears because the master broker can't access NFS, the
slave becomes active (at least, if it can access NFS; if not, then it
doesn't know that it could become active and it can't read the messages
from disk anyway). This is exactly what you would want to happen.

The real problem here is the one in your last paragraph: when the slave
acquires the lock because the master can't access NFS, the master isn't
detecting that and becoming the slave. I'd suggest you try to recreate
this failure (in a dev environment) by causing the master broker to be
unable to access NFS and confirming that the master remains active even
after the slave becomes the master. Assuming that happens, submit a JIRA
bug report to describe the problem. Make sure you provide lots of details
about your NFS setup (include version numbers, file system type, etc.) and
about the O/Ses of the machines the brokers run on, since the behavior
might vary based on some of those things and you want to make sure that
whoever investigates this can reproduce it. But make sure you can
reproduce it first.

Tim
Hi,

I got the logs in this order only and after further checking the system I
got to know that NFS(where we put kahadb and broker logs) was slow during
that time.

I can understand the delay in logs or I/O operations are slow during that
time but it does not justify why failover also open it's transport
connector.

The main concern here is that the (master-slave-shared-storage)topology is
broken which should not happen in any case. If I/O operations are not
happening, master broker should stop and let the failover serve the clients
but here master didn't stop and both opened the connector.

Thanks,
Anuj

--
View this message in context:
http://activemq.2283324.n4.nabble.com/ActiveMQ-master-slave-topology-issue-BUG-tp4695677p4695731.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: ActiveMQ master-slave topology issue[BUG]

Reply via email to