Re: ActiveMQ cluster fails with "server null" when the Zookeeper master node goes offline

Tim Bain Fri, 18 Dec 2015 14:21:12 -0800

Scratch that, I posted this to the wrong LevelDB-related thread authored by
someone named Stephen today.


But Stephen, I'd suggest you submit a JIRA bug if you don't get any
response here on the mailing list; there are few to no LevelDB experts who
monitor the list, unfortunately, so help with LevelDB is unfortunately hard
to come by.

Tim

On Fri, Dec 18, 2015 at 11:19 AM, Tim Bain <tb...@alumni.duke.edu> wrote:

> Stephen submitted https://issues.apache.org/jira/browse/AMQ-6095 to
> capture this bug.
>
> On Fri, Dec 18, 2015 at 8:13 AM, glstephen <glstep...@gmail.com> wrote:
>
>> I have encountered an issue with ActiveMQ where the entire cluster will
>> fail
>> when the master Zookeeper node goes offline.
>>
>> We have a 3-node ActiveMQ cluster setup in our development environment.
>> Each
>> node has ActiveMQ 5.12.0 and Zookeeper 3.4.6 (*note, we have done some
>> testing with Zookeeper 3.4.7, but this has failed to resolve the issue.
>> Time
>> constraints have so far prevented us from testing ActiveMQ 5.13).
>>
>> What we have found is that when we stop the master ZooKeeper process (via
>> the "end process tree" command in Task Manager), the remaining two
>> ZooKeeper
>> nodes continue to function as normal. Sometimes the ActiveMQ cluster is
>> able
>> to handle this, but sometimes it does not.
>>
>> When the cluster fails, we typically see this in the ActiveMQ log:
>>
>> 2015-12-18 09:08:45,157 | WARN  | Too many cluster members are connected.
>> Expected at most 3 members but there are 4 connected. |
>> org.apache.activemq.leveldb.replicated.MasterElector |
>> WrapperSimpleAppMain-EventThread
>> ...
>> ...
>> 2015-12-18 09:27:09,722 | WARN  | Session 0x351b43b4a560016 for server
>> null,
>> unexpected error, closing socket connection and attempting reconnect |
>> org.apache.zookeeper.ClientCnxn |
>> WrapperSimpleAppMain-SendThread(192.168.0.10:2181)
>> java.net.ConnectException: Connection refused: no further information
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>> Method)[:1.7.0_79]
>>         at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown
>> Source)[:1.7.0_79]
>>         at
>>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)[zookeeper-3.4.6.jar:3.4.6-1569965]
>>         at
>>
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)[zookeeper-3.4.6.jar:3.4.6-1569965]
>>
>> We were immediately concerned by the fact that (A)ActiveMQ seems to think
>> there are four members in the cluster when it is only configured with 3
>> and
>> (B) when the exception is raised, the server appears to be null. We then
>> increased ActiveMQ's logging level to DEBUG in order to display the list
>> of
>> members:
>>
>> 2015-12-18 09:33:04,236 | DEBUG | ZooKeeper group changed: Map(localhost
>> ->
>>
>> ListBuffer((0000000156,{"id":"localhost","container":null,"address":null,"position":-1,"weight":5,"elected":null}),
>>
>> (0000000157,{"id":"localhost","container":null,"address":null,"position":-1,"weight":1,"elected":null}),
>> (0000000158,{"id":"localhost","container":null,"address":"tcp://
>> 192.168.0.11:61619","position":-1,"weight":10,"elected":null}),
>>
>> (0000000159,{"id":"localhost","container":null,"address":null,"position":-1,"weight":10,"elected":null})))
>> | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ
>> BrokerService[localhost] Task-14
>>
>> Can anyone suggest why this may be happening and/or suggest a way to
>> resolve
>> this? Our configurations are shown below:
>>
>> *ZooKeeper:*
>> tickTime=2000
>> dataDir=C:\\zookeeper-3.4.7\\data
>> clientPort=2181
>> initLimit=5
>> syncLimit=2
>> server.1=192.168.0.10:2888:3888
>> server.2=192.168.0.11:2888:3888
>> server.3=192.168.0.12:2888:3888
>>
>> *ActiveMQ (server.1):*
>> <persistenceAdapter>
>>     <replicatedLevelDB
>>         directory="activemq-data"
>>         replicas="3"
>>         bind="tcp://0.0.0.0:61619"
>>         zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181"
>>         zkPath="/activemq/leveldb-stores"
>>         hostname="192.168.0.10"
>>         weight="5"/>
>>         //server.2 has a weight of 10, server.3 has a weight of 1
>> </persistenceAdapter>
>>
>>
>>
>> --
>> View this message in context:
>> http://activemq.2283324.n4.nabble.com/ActiveMQ-cluster-fails-with-server-null-when-the-Zookeeper-master-node-goes-offline-tp4705165.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>>
>
>

Re: ActiveMQ cluster fails with "server null" when the Zookeeper master node goes offline

Reply via email to