Scratch that, I posted this to the wrong LevelDB-related thread authored by someone named Stephen today.
But Stephen, I'd suggest you submit a JIRA bug if you don't get any response here on the mailing list; there are few to no LevelDB experts who monitor the list, unfortunately, so help with LevelDB is unfortunately hard to come by. Tim On Fri, Dec 18, 2015 at 11:19 AM, Tim Bain <tb...@alumni.duke.edu> wrote: > Stephen submitted https://issues.apache.org/jira/browse/AMQ-6095 to > capture this bug. > > On Fri, Dec 18, 2015 at 8:13 AM, glstephen <glstep...@gmail.com> wrote: > >> I have encountered an issue with ActiveMQ where the entire cluster will >> fail >> when the master Zookeeper node goes offline. >> >> We have a 3-node ActiveMQ cluster setup in our development environment. >> Each >> node has ActiveMQ 5.12.0 and Zookeeper 3.4.6 (*note, we have done some >> testing with Zookeeper 3.4.7, but this has failed to resolve the issue. >> Time >> constraints have so far prevented us from testing ActiveMQ 5.13). >> >> What we have found is that when we stop the master ZooKeeper process (via >> the "end process tree" command in Task Manager), the remaining two >> ZooKeeper >> nodes continue to function as normal. Sometimes the ActiveMQ cluster is >> able >> to handle this, but sometimes it does not. >> >> When the cluster fails, we typically see this in the ActiveMQ log: >> >> 2015-12-18 09:08:45,157 | WARN | Too many cluster members are connected. >> Expected at most 3 members but there are 4 connected. | >> org.apache.activemq.leveldb.replicated.MasterElector | >> WrapperSimpleAppMain-EventThread >> ... >> ... >> 2015-12-18 09:27:09,722 | WARN | Session 0x351b43b4a560016 for server >> null, >> unexpected error, closing socket connection and attempting reconnect | >> org.apache.zookeeper.ClientCnxn | >> WrapperSimpleAppMain-SendThread(192.168.0.10:2181) >> java.net.ConnectException: Connection refused: no further information >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native >> Method)[:1.7.0_79] >> at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown >> Source)[:1.7.0_79] >> at >> >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)[zookeeper-3.4.6.jar:3.4.6-1569965] >> at >> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)[zookeeper-3.4.6.jar:3.4.6-1569965] >> >> We were immediately concerned by the fact that (A)ActiveMQ seems to think >> there are four members in the cluster when it is only configured with 3 >> and >> (B) when the exception is raised, the server appears to be null. We then >> increased ActiveMQ's logging level to DEBUG in order to display the list >> of >> members: >> >> 2015-12-18 09:33:04,236 | DEBUG | ZooKeeper group changed: Map(localhost >> -> >> >> ListBuffer((0000000156,{"id":"localhost","container":null,"address":null,"position":-1,"weight":5,"elected":null}), >> >> (0000000157,{"id":"localhost","container":null,"address":null,"position":-1,"weight":1,"elected":null}), >> (0000000158,{"id":"localhost","container":null,"address":"tcp:// >> 192.168.0.11:61619","position":-1,"weight":10,"elected":null}), >> >> (0000000159,{"id":"localhost","container":null,"address":null,"position":-1,"weight":10,"elected":null}))) >> | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ >> BrokerService[localhost] Task-14 >> >> Can anyone suggest why this may be happening and/or suggest a way to >> resolve >> this? Our configurations are shown below: >> >> *ZooKeeper:* >> tickTime=2000 >> dataDir=C:\\zookeeper-3.4.7\\data >> clientPort=2181 >> initLimit=5 >> syncLimit=2 >> server.1=192.168.0.10:2888:3888 >> server.2=192.168.0.11:2888:3888 >> server.3=192.168.0.12:2888:3888 >> >> *ActiveMQ (server.1):* >> <persistenceAdapter> >> <replicatedLevelDB >> directory="activemq-data" >> replicas="3" >> bind="tcp://0.0.0.0:61619" >> zkAddress="192.168.0.11:2181,192.168.0.10:2181,192.168.0.12:2181" >> zkPath="/activemq/leveldb-stores" >> hostname="192.168.0.10" >> weight="5"/> >> //server.2 has a weight of 10, server.3 has a weight of 1 >> </persistenceAdapter> >> >> >> >> -- >> View this message in context: >> http://activemq.2283324.n4.nabble.com/ActiveMQ-cluster-fails-with-server-null-when-the-Zookeeper-master-node-goes-offline-tp4705165.html >> Sent from the ActiveMQ - User mailing list archive at Nabble.com. >> > >