High CPU load with network connector, failover transport

Tim Robbins Thu, 19 Feb 2015 17:16:24 -0800

Hi,

We’ve noticed a regression in ActiveMQ 5.10.1 vs. 5.10.0 with a configuration 
similar to the following:


Broker 1:
networkConnector with static:(failover:(tcp://broker2 
<tcp://broker2>)?randomize=false&maxReconnectAttempts=0)

Broker 2:
networkConnector with static:(failover:(tcp://broker1 
<tcp://broker1>)?randomize=false&maxReconnectAttempts=0)

When one of the brokers is restarted, the other broker uses ~400% CPU. The 
cause is the FailoverTransport reconnectTask spinning, and nothing is stopping 
the task.

Reverting this fix made for AMQ-5315, while it does reintroduce the 
NullPointerException, does handle failover properly without spinning:
https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f
 
<https://git1-us-west.apache.org/repos/asf/activemq/repo?p=activemq.git;a=commitdiff;h=c391321d1b5b59542d847717654b0d4dba54cf2f>

The reason it works after reverting that change is the NullPointerException is 
caught, -> serviceLocalException() -> 
ServiceSupport.dispose(getControllingService()); with the fix made in AMQ-5315, 
the dispose() call is never made.

I think, rather than reverting the AMQ-5315 commit, it would be fine to just 
call dispose() before fireBridgeFailed() in the case where we can’t retrieve 
the broker info

This does seem like a fairly serious problem; as far as I’m aware this is a 
common use case; anyone using the masterslave transport or the failover 
transport w/ the required maxReconnectAttempts=0 for bridges would be exposed 
to it for example.

Regards,

Tim

High CPU load with network connector, failover transport

Reply via email to