Re: change strategy for determining failure of primary in JBDC-backed setup

Alex Hooper Thu, 24 May 2012 06:57:17 -0700

Gary Tully uttered:

You would need to write some code, but the locker implementation can
be easily overridden.
The interface is: org.apache.activemq.store.jdbc.DatabaseLocker


It acquires the lock in start which typically blocks till it can get a
lock and there are periodic calls to keepalive once the lock is
obtained.

and it is set via xml config on the JDBCPersistenceAdapter via
org.apache.activemq.store.jdbc.JDBCPersistenceAdapter#setDatabaseLocker


Ah, excellent; I'd not managed to tease that information out of the docs.

A lease type strategy may make sense, where a read to determine if
there is an existing owner is followed by a poll when the lease is
expired or an update to start a new lease if none exists. The owner of
the lease needs to renew before it expires and that interval needs to
be configurable to allow timely reclamation.

In the event that the connection drops, if it is recreated before the
lease expires, the master/slave state is retained. If the lease has
expired, a master and slave will contend for the lock to be the new
master.


Yes, that might make sense, thanks. Will need further pondering...


In your setup, it is odd that the dropped connection does not cause
the lock keepAlive to fail and the broker to terminate. It should,
unless there are tcp level options that need to kick in to see the
half close. Or some connection pool config that can pick up on the
failure, there are some validate options on commons jdbc pool that
could help there.

I had already identified the commons pool options that might help and haveconfigured thusly:


  <bean id="oracle-ds"
  class="org.apache.commons.dbcp.BasicDataSource"
  destroy-method="close">
    <property name="driverClassName"
    value="oracle.jdbc.driver.OracleDriver" />
    <property name="url" value="jdbc:oracle:thin:@oracle:1521:bmj01" />
    <property name="username" value="activemq" />
    <property name="password" value="activemq" />
    <property name="maxActive" value="200" />
    <property name="maxIdle" value="5" />
    <property name="testWhileIdle" value="true" />
    <property name="timeBetweenEvictionRunsMillis" value="30000" />
    <property name="validationQuery" value="SELECT 1 FROM DUAL" />
    <property name="poolPreparedStatements" value="true" />
  </bean>


I have yet to try 'removeAbandoned' as that doesn't seem to be appropriate.

Interestingly, netstat on the slave activemq box shows an ESTABLISHED TCPconnection to the oracle server, but the oracle server shows no socket in anystate connected to the slave activemq. Which kind of explains why activemq isn'tnoticing the connection drop. So... maybe the 'removeAbaandoned' option isappropriate, as the connection is not getting cleared by the dbcp checks becausethe connection that has been used to issue the "SELECT * FROM ACTIVEMQ_LOCK FORUPDATE" is deemed as being active and thus never checked.

More fundamentally, of course, I need to work out what's going wrong at the TCPlevel and sort that.


[snip]


Hopefully the above will help, but start with determining why in your
current setup, the lock keepalive is not triggering for you when the
connection is dropped because that is a little odd. unless you have
the org.apache.activemq.store.jdbc.JDBCPersistenceAdapter#setLockKeepAlivePeriod
= 0.


Is that option configurable in the XML config?

Anyway, thanks, Gary, for a detailed and pertinent response. This has given me afew things to try.


Cheers,

Alex.



On 24 May 2012 11:45, Alex Hooper <ahoo...@bmjgroup.com> wrote:

Hi,

We are running activemq 5.5.1 in an active/passive failover configuration
with JDBC Persistence to an Oracle backend. The default strategy for
determining whether the current master has failed is for the secondary
server to attempt to get a lock on the database table, waiting indefinitely
for the lock to be granted.

This is not working (at least in our context) as, after a relatively short
time in operation (a handful of hours at most) the connection to Oracle is
dropped. Activemq doesn't notice this, so the secondary sits there happily
waiting for a lock it can now never get and, in the event of a failure,
won't serve any clients as it is not a master.

Is there some way to change the decision mechanism to, eg, a polling
strategy? Or can anyone suggest another resolution to this problem?

Alex.
--
Alex Hooper
Operations Team Leader, BMJ Group, BMA House, London WC1H 9JR
Tel: +44 (0) 20 7383 6049
http://group.bmj.com/

--
Alex Hooper
Operations Team Leader, BMJ Group, BMA House, London WC1H 9JR
Tel: +44 (0) 20 7383 6049
http://group.bmj.com/

_______________________________________________________________________
The BMJ Group is one of the world's most trusted providers of medical 
information for doctors, researchers, health care workers and patients 
group.bmj.com.  This email and any attachments are confidential.  If you have 
received this email in error, please delete it and kindly notify us.  If the 
email contains personal views then the BMJ Group accepts no responsibility for 
these statements.  The recipient should check this email and attachments for 
viruses because the BMJ Group accepts no liability for any damage caused by 
viruses.  Emails sent or received by the BMJ Group may be monitored for size, 
traffic, distribution and content.  BMJ Publishing Group Limited trading as BMJ 
Group.  A private limited company, registered in England and Wales under 
registration number 03102371.  Registered office: BMA House, Tavistock Square, 
London WC1H 9JR, UK.
_______________________________________________________________________

Re: change strategy for determining failure of primary in JBDC-backed setup

Reply via email to