That's good to hear. We tried to use AMQ (and still want it to be a solution for us!) a couple years ago in our amazon cluster and using network of brokers found that consumers would fall off and not reconnect and on the producer side things would just come to a halt. We had to act quickly and didn't have time to do actual experiments unfortunately but it was with a cluster of 50 or so servers and a master unit.

All 50 servers had about 6-10 threads pushing messages on and pulling them off. It would go for a few thousand cycles and then the whole thing would halt. After reading the OP's description about a recent AMQ and their tests it sounded very familiar to ours a couple years ago. I think we might have been using MySQL for persistence off/on as well.

Perhaps there's some configuration magic, backends, and other things that "can work", but out of the box we had trouble. Others have too. I'm sure there's a reason the OP is having problems. Would love to understand what they are so we can come back to AMQ. But the OP's documentation of the problem seems very well put together which leads us to think AMQ still has scalability issues.

Maybe you can share some details how you were able to get it to work at scale?

With Regards,
Darren


On 02/27/2013 04:41 AM, Gaurav Sharma wrote:
Wish I could put up some of Hiram's benchmark ActiveMQ tests' perf stats up 
here from even one of our dev clusters (forget about stress or prod env's) but 
company policy doesn't permit I share. One of my prod clusters does apple push 
notifications among other things - it has been up and invisible for more than a 
hundred days and delivered X hundreds of millions of events during this time 
with flat system graphs and no nagios alerts (not even warnings) - just a 
testament to how rock solid and reliable ActiveMQ is if you know what you are 
doing.

And I should share my viewpoint since you talk bout Mongo.. we have tens of 
TB's of data in Mongo and no, we do not love to operate it at scale. 
Distributed queueing is hard - if possible, push distribution down a tier or 
two to persistent/disk storage and minimize opportunities to have to deal with 
consensus protocols. I have nothing but great things to say about ActiveMQ. 
Yes, it can be sped up even further if you strip it down and do append-only 
formats like kafka but then, forget about all the cool features and 
implementing jms specs, etc - there are many trade-offs to be made and Apollo 
seems headed in that direction (Dejan: please correct me if I mis-stated this).

When you say, you "failed to get it to scale", can you share some specifics, so 
some of us on here can help? What were your targets and how did you fail?

Mandar: 1gig heap is not very much unless you can tune the occupancy fraction appropriate 
for your workload type besides other things. Also, when you mention, "linux with 
quadcore + 8GB RAM", is that a server grade machine that you use or your laptop? I 
have trouble getting anything lesser than 48g 1/2U's because now I need to play games 
with the collectors running out of steam at 16-18g heaps.


On Feb 26, 2013, at 19:09, Darren Govoni <dar...@ontrenet.com> wrote:

Unfortunately, getting AMQ to show production-grade scalability and reliability 
is a real challenge.
We tried and failed to get it to scale or perform acceptably and were forced to 
write our own distributed queue on top of mongodb.

The addition of AMQP is nice however.

On 02/15/2013 05:44 AM, mandar.wanpal wrote:
Hi All,

We are seeing some serious issue with AMQ in few of our load tests.

We have configured our AMQ with below configs.

Heap size increased 1GB
JMX port opened for AMQ.
jms.prefetchPolicy.all=10
constantPendingMessageLimitStrategy=50.
-XX:Permsize=128m -XX:MaxPermsize=128m

We have AMQ with KahaDB in simple failover settings, so if AMQ1 fails, AMQ2
takes over. Messages are laso not huge in size.

Observations:
1. If Heapsize set to 512 mb, AMQ fails after some 7500 reqs and switches to
AMQ2. AMQ 2 also not able to continue because producer is not able to
initiate proper communication with AMQ2 because of may be the backlog of
messages that AMQ1 didnt accept.
2. If Heapsize set to 1g, AMQ fails after some 15000 reqs and switches to
AMQ2. AMQ 2 also not able to continue because producer is not able to
initiate proper communication with AMQ2 because of may be the backlog of
messages that AMQ1 didnt accept.
3. When AMQ fails, we start getting OutOfMemory errors and AMQ starts doing
FullGC continuously. As FullGC halts JVM, AMQ halts and cant do anything.
4. After seeing so many FullGC, we took heapdump and analysed it with
Eclipse. PFA report which states few suspected areas which are causing leak
in AMQ.

AMQ_Leak_Report.pdf
<http://activemq.2283324.n4.nabble.com/file/n4663532/AMQ_Leak_Report.pdf>

5. Frequency of FullGC increases with time and amount of memory they can
reclaim gets reduced.

Queries:
What can be the ideal config for AMQ to atleast process upto 1lac reqs
without requiring a restart and without setting heap to some gigantic size.
I have linux with quadcore + 8GB RAM.



-----
Regards,
Mandar Wanpal
Email: mandar.wan...@gmail.com
--
View this message in context: 
http://activemq.2283324.n4.nabble.com/AMQ-halts-and-crashes-after-few-thousand-reqs-tp4663532.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Reply via email to