Stephen submitted https://issues.apache.org/jira/browse/AMQ-6095 to capture
this bug.

On Fri, Dec 18, 2015 at 8:13 AM, glstephen <glstep...@gmail.com> wrote:

> I have encountered an issue with ActiveMQ where the entire cluster will
> fail
> when the master Zookeeper node goes offline.
>
> We have a 3-node ActiveMQ cluster setup in our development environment.
> Each
> node has ActiveMQ 5.12.0 and Zookeeper 3.4.6 (*note, we have done some
> testing with Zookeeper 3.4.7, but this has failed to resolve the issue.
> Time
> constraints have so far prevented us from testing ActiveMQ 5.13).
>
> What we have found is that when we stop the master ZooKeeper process (via
> the "end process tree" command in Task Manager), the remaining two
> ZooKeeper
> nodes continue to function as normal. Sometimes the ActiveMQ cluster is
> able
> to handle this, but sometimes it does not.
>
> When the cluster fails, we typically see this in the ActiveMQ log:
>
> 2015-12-18 09:08:45,157 | WARN  | Too many cluster members are connected.
> Expected at most 3 members but there are 4 connected. |
> org.apache.activemq.leveldb.replicated.MasterElector |
> WrapperSimpleAppMain-EventThread
> ...
> ...
> 2015-12-18 09:27:09,722 | WARN  | Session 0x351b43b4a560016 for server
> null,
> unexpected error, closing socket connection and attempting reconnect |
> org.apache.zookeeper.ClientCnxn |
> WrapperSimpleAppMain-SendThread(192.168.0.10:2181)
> java.net.ConnectException: Connection refused: no further information
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> Method)[:1.7.0_79]
>         at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown
> Source)[:1.7.0_79]
>         at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)[zookeeper-3.4.6.jar:3.4.6-1569965]
>         at
>
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)[zookeeper-3.4.6.jar:3.4.6-1569965]
>
> We were immediately concerned by the fact that (A)ActiveMQ seems to think
> there are four members in the cluster when it is only configured with 3 and
> (B) when the exception is raised, the server appears to be null. We then
> increased ActiveMQ's logging level to DEBUG in order to display the list of
> members:
>
> 2015-12-18 09:33:04,236 | DEBUG | ZooKeeper group changed: Map(localhost ->
>
> ListBuffer((0000000156,{"id":"localhost","container":null,"address":null,"position":-1,"weight":5,"elected":null}),
>
> (0000000157,{"id":"localhost","container":null,"address":null,"position":-1,"weight":1,"elected":null}),
> (0000000158,{"id":"localhost","container":null,"address":"tcp://
> 192.168.0.11:61619","position":-1,"weight":10,"elected":null}),
>
> (0000000159,{"id":"localhost","container":null,"address":null,"position":-1,"weight":10,"elected":null})))
> | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ
> BrokerService[localhost] Task-14
>
> Can anyone suggest why this may be happening and/or suggest a way to
> resolve
> this? Our configurations are shown below:
>
> *ZooKeeper:*
> tickTime=2000
> dataDir=C:\\zookeeper-3.4.7\\data
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=192.168.0.10:2888:3888
> server.2=192.168.0.11:2888:3888
> server.3=192.168.0.12:2888:3888
>
> *ActiveMQ (server.1):*
> <persistenceAdapter>
>     <replicatedLevelDB
>         directory="activemq-data"
>         replicas="3"
>         bind="tcp://0.0.0.0:61619"
>         zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181"
>         zkPath="/activemq/leveldb-stores"
>         hostname="192.168.0.10"
>         weight="5"/>
>         //server.2 has a weight of 10, server.3 has a weight of 1
> </persistenceAdapter>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-cluster-fails-with-server-null-when-the-Zookeeper-master-node-goes-offline-tp4705165.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Reply via email to