We put a no-Duplex Configuration instead of a Duplex Configuration and it seemed to work better.... But today during a network problem (alternatively on/off) our process doesn't resist ....
We have - a thread dump which shows 85 StartLocalBridge Threads waiting for the same latch into the DemandForwardingBridgeSupport.StartLocalBridge method : protected void startLocalBridge() throws Exception { if (localBridgeStarted.compareAndSet(false, true)) { synchronized (this) { if (LOG.isTraceEnabled()) { LOG.trace(configuration.getBrokerName() + " starting local Bridge, localBroker=" + localBroker); } remoteBrokerNameKnownLatch.await(); ... } - 960 CLOSE_WAIT - a file descriptor limit Will the transport.closeAsync=false flag be helpful here ? Eric-AWL Gary Tully wrote: > > Hi, as you can see, this is a complicated area of the code. The best > approach is to try and produce a test case for your scenario. Take a > look at the test: BrokerQueueNetworkWithDisconnectTest in > activemq-core. This can simulate network failures and can use > multicast (bridgeAllBrokers). Getting a reproducible test case is the > best way to validate your changes and protect them into the future. > > The only other alternative is to keep adding your suggestions to the > jira issue (https://issues.apache.org/activemq/browse/AMQ-2774) and > with a bit of luck I (or some one else) will have a change to look at > it before 5.4 . > > > On 6 July 2010 12:37, Eric-AWL <eric.vinc...@atosorigin.com> wrote: >> >> I wonder if it could not have some undesirable effects on both side of >> the >> duplex connection .... >> >> perhaps we should test the started AtomicBoolean, in the start() method >> after the corresponding "await" and shouldn't execute the end of the >> start >> method ? >> >> if (configuration.isDuplex() && duplexInitiatingConnection == >> null) { >> // initiator side of duplex network >> remoteBrokerNameKnownLatch.await(); >> } >> >> HERE ??? (if started.get()) { ??? >> >> try { >> triggerRemoteStartBridge(); >> } catch (IOException e) { >> LOG.warn("Caught exception from remote start", e); >> } >> NetworkBridgeListener l = this.networkBridgeListener; >> if (l != null) { >> l.onStart(this); >> } >> >> It's the first big problem I have with ActiveMQ complex configuration, it >> happens when network is faulty (that happens not very often), and I don't >> know ActiveMQ source code very well .... >> >> Who could help me to identify potential effects of this change, before I >> try >> to modify it ? (I can't do that on my production system without some >> tests >> and expert validation) >> >> Eric-AWL >> >> >> Gary Tully wrote: >>> >>> that seems reasonable. want to submit a patch against trunk? >>> >>> On 6 July 2010 12:10, Eric-AWL <eric.vinc...@atosorigin.com> wrote: >>>> >>>> What could happen if we add >>>> >>>> if (configuration.isDuplex() && duplexInitiatingConnection == >>>> null) >>>> { >>>> // initiator side of duplex network >>>> remoteBrokerNameKnownLatch.countDown(); >>>> } >>>> >>>> into the stop() method of DemandForwardingBridgeSupport class ? >>>> >>>> Eric-AWL >>>> >>>> >>>> Eric-AWL wrote: >>>>> >>>>> Hi >>>>> >>>>> I'm sure that I identified a Latch problem in Multicast Network >>>>> Discovery >>>>> mechanism on Duplex connection >>>>> >>>>> The multicast notifier thread is blocked. here the trace >>>>> >>>>> "Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]" >>>>> daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition >>>>> [0x0000000044c26000..0x0000000044c26b90] >>>>> java.lang.Thread.State: WAITING (parking) >>>>> at sun.misc.Unsafe.park(Native Method) >>>>> - parking to wait for <0x00002aaab3dd66f0> (a >>>>> java.util.concurrent.CountDownLatch$Sync) >>>>> at >>>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) >>>>> at >>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) >>>>> at >>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905) >>>>> at >>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217) >>>>> at >>>>> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) >>>>> at >>>>> org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231) >>>>> at >>>>> org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114) >>>>> at >>>>> org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> >>>>> The problem appears when the network is quickly and alternatively >>>>> on/off >>>>> between the two components. >>>>> The bridge is created in one direction, but the answer can not be >>>>> received. >>>>> >>>>> The thread is blocked on the CountDownLatch. Even if multicast frames >>>>> are >>>>> received, the component can not establish a new network connection. >>>>> >>>>> Here are an corresponding activemq trace >>>>> >>>>> When it is OK : >>>>> 2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO >>>>> DiscoveryNetworkConnector >>>>> - Establishing network connection from >>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false >>>>> 2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO DemandForwardingBridge >>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 >>>>> and >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) >>>>> has been established. >>>>> >>>>> 2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO DemandForwardingBridge >>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to >>>>> SIBBusSupervisor-tpnocp11v-bus stopped >>>>> >>>>> 2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO >>>>> DiscoveryNetworkConnector >>>>> - Establishing network connection from >>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false >>>>> 2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO DemandForwardingBridge >>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 >>>>> and >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus) >>>>> has been established. >>>>> >>>>> 2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO DemandForwardingBridge >>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to >>>>> SIBBusSupervisor-tpnocp11v-bus stopped >>>>> >>>>> When it is KO : "Unknown" >>>>> >>>>> 2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO >>>>> DiscoveryNetworkConnector >>>>> - Establishing network connection from >>>>> vm://SIBBusModule-NOCP5-tpnocp08s-bus to >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false >>>>> 2010-06-22 22:59:18,031 [18.126.30:14101] WARN DemandForwardingBridge >>>>> - Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 >>>>> and >>>>> tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due >>>>> to >>>>> a >>>>> remote error: java.net.SocketException: Connection reset >>>>> 2010-06-22 22:59:18,033 [NetworkBridge ] INFO DemandForwardingBridge >>>>> - SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped >>>>> >>>>> >>>>> Here is the other side corresponding activemq trace >>>>> >>>>> activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO >>>>> TransportConnection - Created Duplex Bridge back to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus >>>>> >>>>> activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO >>>>> DemandForwardingBridge - SIBBusSupervisor-tpnocp11v-bus bridge >>>>> to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped >>>>> >>>>> activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO >>>>> TransportConnection - Created Duplex Bridge back to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus >>>>> activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO >>>>> DemandForwardingBridge - Network connection between >>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1770 and >>>>> tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been >>>>> established. >>>>> >>>>> activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO >>>>> DemandForwardingBridge - SIBBusSupervisor-tpnocp11v-bus bridge >>>>> to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped >>>>> >>>>> activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO >>>>> TransportConnection - Created Duplex Bridge back to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus >>>>> activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO >>>>> DemandForwardingBridge - Network connection between >>>>> vm://SIBBusSupervisor-tpnocp11v-bus#1806 and >>>>> tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been >>>>> established. >>>>> >>>>> activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO >>>>> DemandForwardingBridge - SIBBusSupervisor-tpnocp11v-bus bridge >>>>> to >>>>> SIBBusModule-NOCP5-tpnocp08s-bus stopped >>>>> >>>>> >>>>> Eric-AWL >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084235.html >>>> Sent from the ActiveMQ - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >>> >>> -- >>> http://blog.garytully.com >>> >>> Open Source Integration >>> http://fusesource.com >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29084410.html >> Sent from the ActiveMQ - User mailing list archive at Nabble.com. >> >> > > > > -- > http://blog.garytully.com > > Open Source Integration > http://fusesource.com > > -- View this message in context: http://old.nabble.com/MultiCast-Discovery-and-refusal-of-connection-tp28827529p29098137.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.