I've had a dev cluster running for a little while now and twice I've seen interruptions where the cluster didn't recover, didn't select a new master.
I had hoped AMQ-5082 fixed that issue but it looks like there might be additional problems. How many of you folks are running replicated leveldb, and what issues, if any, have you been seeing with failover or leader election? I've got a tcpdump running to capture the zookeeper traffic and I'm hoping that will give me a clue as to what is going wrong. But anybody else seeing the issues and telling me what they see could also help me debug this... Here's what I captured from zookeeper. I'll also not that even *delete* the nodes didn't trigger any sort of activity from activemq, which indicates deeply flawed client-side logic... :-( /activemq/amq-dev-1/000000000041 {"id":"amq-dev-1","container":null,"address":null,"position":-1,"weight":1,"elected":null} /activemq/amq-dev-1/000000000043 {"id":"amq-dev-1","container":null,"address":null,"position":0,"weight":1,"elected":null} /activemq/amq-dev-1/000000000044 {"id":"amq-dev-1","container":null,"address":null,"position":2936,"weight":1,"elected":null}