Our problem with using Oracle was that if the Active or Hot instance were to become disconnected and with the changes made to Oracle to timeout the connection and therefore release the lock on the database were to succeed, we would indeed have a secondary or standby instance begin processing and all is well until the previous instance again returns to the network and what we are finding is that it will again create a session with Oracle and will begin processing in parallel without attempting to gain a lock on the DB. Now we have a problem of two instances of ActiveMQ are running.
Any advice on the best method? I see there have been some problems with persistence store corruption with NFS as well. http://old.nabble.com/Failover-and-Fail-BACK-td28198179.html#a28222719 Is ActiveMQ not ready for production enterprise networks or is there a better method of implementing H.A.? For Oracle, the master instance of ActiveMQ obtains a lock the database using a "select for update" SQL statement. It appears that when you pull the plug, the data store does not detect the stale connection in a timely enough fashion for your requirements. You can shorten the time needed to detect the stale connection by tuning the keepAlive TCP parameters ( OS specific) to meet your uptime requirements. When using oracle, setting 'ENABLE=BROKEN' in the TNS ora will enable use of the keepAlive packets. Oracle also allows you to ping the client at regular intervals set by sqlnet.expire_time (in minutes!). As always, do your testing in an environment that mimics your production environment first. You may have to use trial and error to find the right settings for your OS and data store. -- View this message in context: http://old.nabble.com/Noob-Questions---Fail-over---Redundancy-Help.-tp29057308p29090284.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.