Hi, I've recently implemented an ActiveMQ installation using Zookeeper as cluster manager.
Our setup is: 12 contributors writing to two Virtual topics 3 Queues (each with 12 consumers) Approx. 120 messages per second The setup of activemq.xml is fairly default, here are a few changes; <broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.data}" schedulePeriodForDestinationPurge="10000"> (to remove old queues) <policyEntry queue=">" gcInactiveDestinations="true" inactiveTimoutBeforeGC="30000"/> <replicatedLevelDB directory="/opt/activemq/activemq-data" replicas="3" bind="tcp://0.0.0.0:0" zkAddress="amqserver1:2181,amqserver2:2181,amqserver3:2181" zkPassword="password" zkPath="/activemq/leveldb-stores" hostname="amqserver1" /> zoo.cfg has: tickTime=8000 # this was increased as we saw some issues in our testing when it was set to the default as network congestion could trigger a failover event. initLimit=10 syncLimit=5 dataDir=/opt/zookeeper/data clientPort=2181 server.1=amqserver1:2888:3888 server.2=amqserver2:2888:3888 server.3=amqserver3:2888:3888 The setup was working fine for about a week, but then we started seeing the system fail from one cluster member to another, repeatedly every few minutes. By shutting down some of the consumers (and having just one Virtual Queue), the system has regained stability. CPU load, I/O activity was not high before or during the failover event. 2015-12-12 01:33:25,784 | INFO | Attaching... Downloaded 3887.83/3887.85 kb and 3/4 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1 2015-12-12 01:33:25,786 | INFO | Attaching... Downloaded 3887.85/3887.85 kb and 4/4 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1 2015-12-12 01:33:25,788 | INFO | Attached | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | hawtdispatch-DEFAULT-1 2015-12-12 02:07:32,737 | INFO | Not enough cluster members have reported their update positions yet. | org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread 2015-12-12 02:07:32,823 | INFO | Slave stopped | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-4 2015-12-12 02:07:32,825 | INFO | Not enough cluster members have reported their update positions yet. | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-4 2015-12-12 02:07:32,832 | INFO | Not enough cluster members have reported their update positions yet. | org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread 2015-12-12 02:07:32,871 | INFO | Promoted to master | org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread 2015-12-12 02:07:32,909 | INFO | Using the pure java LevelDB implementation. | org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] Task-4 2015-12-12 02:07:36,060 | INFO | Master started: tcp://amqserver2:55655 | org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ BrokerService[localhost] Task-5 2015-12-12 02:07:36,423 | INFO | Slave has connected: 675aa794-3d5c-48f4-83ff-602777b8a53b | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2015-12-12 02:07:36,486 | INFO | Slave has connected: 9bc02001-fc26-458a-8385-ac73f3ace8f0 | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2015-12-12 02:07:36,932 | INFO | Slave has now caught up: 675aa794-3d5c-48f4-83ff-602777b8a53b | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2015-12-12 02:07:37,022 | INFO | Slave has now caught up: 9bc02001-fc26-458a-8385-ac73f3ace8f0 | org.apache.activemq.leveldb.replicated.MasterLevelDBStore | hawtdispatch-DEFAULT-2 2015-12-12 02:07:37,096 | INFO | Installing Discarding Dead Letter Queue broker plugin[dropAll=true; dropTemporaryTopics=true; dropTemporaryQueues=true; dropOnly=null; reportInterval=1000] | org.apache.activemq.plugin.DiscardingDLQBrokerPlugin | main 2015-12-12 02:07:37,310 | INFO | Apache ActiveMQ 5.12.0 (localhost, ID:amqserver2.emea.kuoni.int-44654-1449886056958-0:1) is starting | org.apache.activemq.broker.BrokerService | main 2015-12-12 02:07:37,331 | INFO | Listening for connections at: tcp://amqserver2.emea.kuoni.int:61616?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2015-12-12 02:07:37,332 | INFO | Connector openwire started | org.apache.activemq.broker.TransportConnector | main 2015-12-12 02:07:37,335 | INFO | Listening for connections at: amqp://amqserver2.emea.kuoni.int:5672?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2015-12-12 02:07:37,336 | INFO | Connector amqp started | org.apache.activemq.broker.TransportConnector | main 2015-12-12 02:07:37,340 | INFO | Listening for connections at: stomp://amqserver2.emea.kuoni.int:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2015-12-12 02:07:37,341 | INFO | Connector stomp started | org.apache.activemq.broker.TransportConnector | main 2015-12-12 02:07:37,344 | INFO | Listening for connections at: mqtt://amqserver2.emea.kuoni.int:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.TransportServerThreadSupport | main 2015-12-12 02:07:37,345 | INFO | Connector mqtt started | org.apache.activemq.broker.TransportConnector | main 2015-12-12 02:07:37,432 | INFO | Listening for connections at ws://amqserver2.emea.kuoni.int:61614?maximumConnections=1000&wireFormat.maxFrameSize=104857600 | org.apache.activemq.transport.ws.WSTransportServer | main 2015-12-12 02:07:37,438 | INFO | Connector ws started | org.apache.activemq.broker.TransportConnector | main 2015-12-12 02:07:37,439 | INFO | Apache ActiveMQ 5.12.0 (localhost, ID:amqserver2.emea.kuoni.int-44654-1449886056958-0:1) started | org.apache.activemq.broker.BrokerService | main 2015-12-12 02:07:37,440 | INFO | For help or more information please see: http://activemq.apache.org | org.apache.activemq.broker.BrokerService | main 2015-12-12 02:07:37,774 | INFO | ActiveMQ WebConsole available at http://0.0.0.0:8161/ | org.apache.activemq.web.WebConsoleStarter | main 2015-12-12 02:07:37,775 | INFO | ActiveMQ Jolokia REST API available at http://0.0.0.0:8161/api/jolokia/ | org.apache.activemq.web.WebConsoleStarter | main 2015-12-12 02:07:37,814 | INFO | Initializing Spring FrameworkServlet 'dispatcher' | /admin | main 2015-12-12 02:07:38,054 | INFO | jolokia-agent: No access restrictor found at classpath:/jolokia-access.xml, access to all MBeans is allowed | /api | main 2015-12-12 02:08:20,184 | INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException | org.apache.activemq.util.DefaultIOExceptionHandler | LevelDB IOException handler. java.io.IOException at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)[activemq-client-5.12.0.jar:5.12.0] at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:552)[activemq-leveldb-store-5.12.0.jar:5.12.0] at org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:1044)[activemq-leveldb-store-5.12.0.jar:5.12.0] at org.apache.activemq.leveldb.LevelDBClient.store(LevelDBClient.scala:1390)[activemq-leveldb-store-5.12.0.jar:5.12.0] at org.apache.activemq.leveldb.DBManager$$anonfun$drainFlushes$1.apply$mcV$sp(DBManager.scala:627)[activemq-leveldb-store-5.12.0.jar:5.12.0] at org.fusesource.hawtdispatch.package$$anon$4.run(hawtdispatch.scala:330)[hawtdispatch-scala-2.11-1.21.jar:1.21] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_51] at java.lang.Thread.run(Thread.java:744)[:1.7.0_51] 2015-12-12 02:08:20,189 | INFO | Apache ActiveMQ 5.12.0 (localhost, ID:amqserver2.emea.kuoni.int-44654-1449886056958-0:1) is shutting down | org.apache.activemq.broker.BrokerService | IOExceptionHandler: stopping BrokerService[localhost] 2015-12-12 02:08:20,238 | WARN | Transport Connection to: tcp://10.241.163.73:52172 failed: java.io.IOException: Unexpected error occurred: org.apache.activemq.broker.BrokerStoppedException: Broker BrokerService[localhost] is being stopped | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.241.163.73:52172@61616 Does this look like a bug? Or a misconfiguration somewhere? Or are the servers under-resourced? All three AMQ servers are specced the same - VMWare, Centos 6, 2 x vCPU, 8GB RAM Thanks, Damian -- View this message in context: http://activemq.2283324.n4.nabble.com/ActiveMQ-Zookeeper-cluster-ping-ponging-tp4704920.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.