Awesome, thanks for coming back and explaining what happened there. I was super curious about this one. I still think there's room for a unit test to document this and show that the reference in the ProducerBrokerExchange still holds onto the "region" destinations. This reference could be cleared at the end of a send since it's never used again and would allow the broker to reclaim the destination completely.
I'll write the test for that and change that reference!! Tracked here: https://issues.apache.org/jira/browse/AMQ-4222 On Thu, Dec 13, 2012 at 10:17 AM, Jerry Cwiklik <cwik...@us.ibm.com> wrote: > Christian, after much pain and suffering I finally figured out what is > going > on. Our system is quite complicated and involves many producers that send > large messages (600K-1.5M) to a relatively few multi-threaded consumers > (services) which run "forever". The producers are transient and can be > killed by our custom job scheduler at any time via kill -9 to make room for > other producers. We run the broker with 10G heap. > > The consumer is coded to group and cache Sessions with a Connection which > has an inactivity timer associated with it. Every time a message is sent, > the timer is restarted. If the timer pops (default 30minutes), the > Sessions, > MessageProducers and a Connection are closed due to inactivity. > > This worked perfectly fine until about 4 weeks ago when we started > experiencing broker OOM problem. While the broker was running we could see > a > steady (fast) rise in the heap usage in a jConsole. After a couple of days > the broker's jvm would OOM. > > The problem started happening when we introduced pingers for the Consumers. > Every minute a pinger sends a message to a Consumer to make sure its alive. > The Consumer replies to the pinger request and restarts inactivity timer. > It > took me awhile to see the bug in our application, but eventually I > determined that our timer behaves incorrectly as it is associated with a > Connection not individual Sessions. The Sessions go stale due to producer > getting killed, and any messages in the broker referenced by > ProducerExchange object are retained indefinitely causing a leak in the > broker. As you explained it to me, the broker uses lazy approach to > cleanup. > Meaning it cleans up on a new message from the Producer. In our case, the > Producer never sends anything and thus no cleanup is ever done. > > The fix for this is to create a timestamp for every Session when it was > last > used to message to the broker. At fixed intervals a Session Reaper thread > wakes up and checks the timestamp of every Session to determine if it has > been inactive for a max allowed time and if so, to close it. > > So the problem was caused by an application bug and the fact that the > broker > takes a lazy approach to cleanup. As a side note, under the described > scenario, I've noticed that the broker memory usage (shown in jConsole) > indicated 0 even though there were ton of messages in the heap with valid > references (held by ProducerExchange). > > Thanks Christian for your help > > -Jerry C > > > > > > > -- > View this message in context: > http://activemq.2283324.n4.nabble.com/Broker-Leak-tp4660437p4660618.html > Sent from the ActiveMQ - User mailing list archive at Nabble.com. > -- *Christian Posta* http://www.christianposta.com/blog twitter: @christianposta