Btw. another way to fix this is to set the purge interval low, say 15 seconds, and then set the max number of queues to delete each time to a low value.
This shouldn’t be as pretty as using one lock per queue but would be easy to implement without modifying much code. Kevin On Sun, Feb 22, 2015 at 1:01 PM, Kevin Burton <[email protected]> wrote: > OK. I think I have a handle regarding what’s happening during queue > purges that cause GC lockups. > > Wanted to get your feedback. > > I can create a bug for this if you guys think my assessment is accurate as > I think the fix is someone reasonable / easy. > > I have a unit test which duplicates this now but I need to do more cleanup > so I can put it into a public github repo for you guys to look at. > > ## Problem overview. > > ActiveMQ supports a feature where it can GC a queue that is inactive. IE > now messages and no consumers. > > However, there’s a bug where > > purgeInactiveDestinations > > in > > org.apache.activemq.broker.region.RegionBroker > > creates a read/write lock (inactiveDestinationsPurgeLock) which is held > during the entire queue GC. > > each individual queue GC takes about 100ms with a disk backed queue and > 10ms with a memory backed (non-persistent) queue. If you have thousands of > them to GC at once the inactiveDestinationsPurgeLock lock is held the > entire time which can last from 60 seconds to 5 minutes (and essentially > unbounded). > > A read lock is also held for this in addConsumer addProducer so that when > a new consumer or produce tries to connect, they’re blocked until queue GC > completes. > > Existing producers/consumers work JUST fine. > > The lock MUST be held on each queue because if it isn’t there’s a race > where a queue is flagged to be GCd , then a producer comes in and writes a > new message, then the background thread deletes the queue which it marked > as GCable but it had the newly produced message. This would result in data > loss. > > ## Confirmed > > I have a unit tests now that confirms this. I create 7500 queues, > produce 1 message in each, then consume it. I keep all consumers open. > then I release all 7500 queues at once. > > I then have an consumer/producer pair I hold open and produce and consume > messages on it. this works fine. > > However, I have another which creates a new producer each time. This will > block for 60,000ms multiple time while queue GC is happening in the > background. > > ## Proposed solution. > > Rework the read/write locks to be one lock per queue. > > So instead of using one global lock per broker, we use one lock per queue > name. This way the locks are FAR more granular and new producers/consumers > won’t block during this time period. > > If a queue named ‘foo’ is being GCd and a new producer is created on a > ‘bar’ queue the bar producer will work fine and won’t block on the foo > queue. > > This can be accomplished by: > > creating a concurrent hash map with the name of the queue as the key (or > an ActiveMQDestination as the key) which stores read/write locks as the > values. Then we use this as the lock backing and the purge thread and > add/remove producers will all reference the more granular lock. > > …. > > Now initially, I was thinking I would just fix this myself, however, I > might have a workaround for our queue design which uses less queues, and I > think this will drop our queue requirement from a few thousand to a few > dozen. So at that point this won’t be as much of a priority. > > However, this is a significant scalability issue with ActiveMQ… one that > doesn’t need to exist. In our situation I think our performance would be > fine even with 7500 queues once this bug is fixed. > > Perhaps it should just exist as an open JIRA that could be fixed at some > time in the future? > > I can also get time to clean up a project with a test which demonstrates > this problem. > > Kevin > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>
