I added a fix in the Jira AMQ-2774 thread.

Eric-AWL


Eric-AWL wrote:
> 
> We put a no-Duplex Configuration instead of a Duplex Configuration and it
> seemed to work better.... But
> today during a network problem (alternatively on/off) our process doesn't
> resist ....
> 
> We have 
> - a thread dump which shows 85 StartLocalBridge Threads waiting for the
> same latch into the DemandForwardingBridgeSupport.StartLocalBridge method
> :
> 
>  protected void startLocalBridge() throws Exception {
>         if (localBridgeStarted.compareAndSet(false, true)) {
>             synchronized (this) {
>                 if (LOG.isTraceEnabled()) {
>                     LOG.trace(configuration.getBrokerName() + " starting
> local Bridge, localBroker=" + localBroker);
>                 }
>                 remoteBrokerNameKnownLatch.await();
>                 ...
> }
> 
> - 960 CLOSE_WAIT
> - a file descriptor limit
> 
> Will the transport.closeAsync=false flag be helpful here ?
> 
> Eric-AWL
> 
> 
> 
> Gary Tully wrote:
>> 
>> Hi, as you can see, this is a complicated area of the code. The best
>> approach is to try and produce a test case for your scenario. Take a
>> look at the test: BrokerQueueNetworkWithDisconnectTest in
>> activemq-core. This can simulate network failures and can use
>> multicast (bridgeAllBrokers). Getting a reproducible test case is the
>> best way to validate your changes and protect them into the future.
>> 
>> The only other alternative is to keep adding your suggestions to the
>> jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and
>> with a bit of luck I (or some one else) will have a change to look at
>> it before 5.4 .
>> 
>> 
>> On 6 July 2010 12:37, Eric-AWL <eric.vinc...@atosorigin.com> wrote:
>>>
>>> I wonder if it could not have some undesirable effects on both side of
>>> the
>>> duplex connection ....
>>>
>>> perhaps we should test the started AtomicBoolean, in the start() method
>>> after the corresponding "await" and shouldn't execute the end of the
>>> start
>>> method ?
>>>
>>>            if (configuration.isDuplex() && duplexInitiatingConnection ==
>>> null) {
>>>                // initiator side of duplex network
>>>                remoteBrokerNameKnownLatch.await();
>>>            }
>>>
>>> HERE ??? (if started.get()) { ???
>>>
>>>            try {
>>>                triggerRemoteStartBridge();
>>>            } catch (IOException e) {
>>>                LOG.warn("Caught exception from remote start", e);
>>>            }
>>>            NetworkBridgeListener l = this.networkBridgeListener;
>>>            if (l != null) {
>>>                l.onStart(this);
>>>            }
>>>
>>> It's the first big problem I have with ActiveMQ complex configuration,
>>> it
>>> happens when network is faulty (that happens not very often), and I
>>> don't
>>> know ActiveMQ source code very well ....
>>>
>>> Who could help me to identify potential effects of this change, before I
>>> try
>>> to modify it ? (I can't do that on my production system without some
>>> tests
>>> and expert validation)
>>>
>>> Eric-AWL
>>>
>>>
>>> Gary Tully wrote:
>>>>
>>>> that seems reasonable. want to submit a patch against trunk?
>>>>
>>>> On 6 July 2010 12:10, Eric-AWL <eric.vinc...@atosorigin.com> wrote:
>>>>>
>>>>> What could happen if we add
>>>>>
>>>>>         if (configuration.isDuplex() && duplexInitiatingConnection ==
>>>>> null)
>>>>> {
>>>>>                // initiator side of duplex network
>>>>>                remoteBrokerNameKnownLatch.countDown();
>>>>>            }
>>>>>
>>>>> into the stop() method of DemandForwardingBridgeSupport class ?
>>>>>
>>>>> Eric-AWL
>>>>>
>>>>>
>>>>> Eric-AWL wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I'm sure that I identified a Latch problem in Multicast Network
>>>>>> Discovery
>>>>>> mechanism on Duplex connection
>>>>>>
>>>>>> The multicast notifier thread is blocked. here the trace
>>>>>>
>>>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
>>>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition
>>>>>> [0x0000000044c26000..0x0000000044c26b90]
>>>>>>    java.lang.Thread.State: WAITING (parking)
>>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>>       - parking to wait for  <0x00002aaab3dd66f0> (a
>>>>>> java.util.concurrent.CountDownLatch$Sync)
>>>>>>       at
>>>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>>>>       at
>>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>>>>>>       at
>>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
>>>>>>       at
>>>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
>>>>>>       at
>>>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
>>>>>>       at
>>>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
>>>>>>       at
>>>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
>>>>>>       at
>>>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
>>>>>>       at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>>       at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>> The problem appears when the network is quickly and alternatively
>>>>>> on/off
>>>>>> between the two components.
>>>>>> The bridge is created in one direction, but the answer can not be
>>>>>> received.
>>>>>>
>>>>>> The thread is blocked on the CountDownLatch. Even if multicast frames
>>>>>> are
>>>>>> received, the component can not establish a new network connection.
>>>>>>
>>>>>> Here are an corresponding activemq trace
>>>>>>
>>>>>> When it is OK :
>>>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO
>>>>>>  DiscoveryNetworkConnector
>>>>>> - Establishing network connection from
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO
>>>>>>  DemandForwardingBridge
>>>>>> - Network connection between
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus#160
>>>>>> and
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>>>> has been established.
>>>>>>
>>>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO
>>>>>>  DemandForwardingBridge
>>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>>>
>>>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO
>>>>>>  DiscoveryNetworkConnector
>>>>>> - Establishing network connection from
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO
>>>>>>  DemandForwardingBridge
>>>>>> - Network connection between
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus#162
>>>>>> and
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
>>>>>> has been established.
>>>>>>
>>>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO
>>>>>>  DemandForwardingBridge
>>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to
>>>>>> SIBBusSupervisor-tpnocp11v-bus stopped
>>>>>>
>>>>>> When it is KO : "Unknown"
>>>>>>
>>>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO
>>>>>>  DiscoveryNetworkConnector
>>>>>> - Establishing network connection from
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false
>>>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN
>>>>>>  DemandForwardingBridge
>>>>>> - Network connection between
>>>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus#164
>>>>>> and
>>>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due
>>>>>> to
>>>>>> a
>>>>>> remote error: java.net.SocketException: Connection reset
>>>>>> 2010-06-22 22:59:18,033 [NetworkBridge  ] INFO
>>>>>>  DemandForwardingBridge
>>>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped
>>>>>>
>>>>>>
>>>>>> Here is the other side corresponding activemq trace
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO
>>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO
>>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus
>>>>>> bridge
>>>>>> to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO
>>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO
>>>>>> DemandForwardingBridge         - Network connection between
>>>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and
>>>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>>>> established.
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO
>>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus
>>>>>> bridge
>>>>>> to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO
>>>>>> TransportConnection            - Created Duplex Bridge back to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus
>>>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO
>>>>>> DemandForwardingBridge         - Network connection between
>>>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and
>>>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been
>>>>>> established.
>>>>>>
>>>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO
>>>>>> DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus
>>>>>> bridge
>>>>>> to
>>>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped
>>>>>>
>>>>>>
>>>>>> Eric-AWL
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html
>>>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> http://blog.garytully.com
>>>>
>>>> Open Source Integration
>>>> http://fusesource.com
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html
>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> 
>> -- 
>> http://blog.garytully.com
>> 
>> Open Source Integration
>> http://fusesource.com
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29236977.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to