OK. This is ready to go and I have a patch branch: https://issues.apache.org/jira/browse/AMQ-5609
I’m stuck at the moment though because tests don’t pass. But it was failing tests before so I don’t think it has anything to do with my changes. On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <[email protected]> wrote: > Actually, is the lock even needed here? Why would it be? if we’re > *removing* a subscription, why does it care if we possibly ALSO remove a > separate / isolated queue before/after the subscription is removed. > > I think this is redundant and can be removed. Maybe I’m wrong though. > > I looked at all the callers and none were associated with queues. > > On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <[email protected]> wrote: > >> So I have some working/theoretical code that should resolve this. >> >> It acquires a lock *per* ActiveMQDestination, this way there is no lock >> contention. >> >> But here’s where I’m stuck. >> >> @Override >>> public void removeSubscription(ConnectionContext context, >>> RemoveSubscriptionInfo info) throws Exception { >>> inactiveDestinationsPurgeLock.readLock().lock(); >>> try { >>> topicRegion.removeSubscription(context, info); >>> } finally { >>> inactiveDestinationsPurgeLock.readLock().unlock(); >>> } >>> } >> >> >> .. this is in RegionBroker. >> >> There is no ActiveMQDestination involved here so I’m not sure the best >> way to resolve this. >> >> Any advice? >> >> >> On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <[email protected]> wrote: >> >>> Yes. That was my thinking too.. that just replacing the CopyOnWriteArraySet >>> with something more performance would solve the issue. >>> >>> This would also improve queue creation time as well as queue deletion >>> time. >>> >>> What I think I”m going to do in the mean time is: >>> >>> - implement a granular lock based on queue name… I am going to use an >>> interface so we can replace the implementation later. >>> >>> - implement timing for the purge thread so it tracks how long it takes >>> to remove a queue but also how long the entire loop takes. >>> >>> I’ll do this on a branch so it should be easy to merge. >>> >>> On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <[email protected]> wrote: >>> >>>> A decent amount of the time is being spent calling remove() on various >>>> array-backed collections. Those data structures might be inappropriate >>>> for >>>> the number of destinations you're running, since array-backed >>>> collections >>>> tend to have add/remove operations that are O(N); some improvement might >>>> come from something as simple as moving to a ConcurrentHashSet instead >>>> of a >>>> CopyOnWriteArraySet, for example. (Or it might make performance worse >>>> because of other aspects of how those collections are used; people other >>>> than me would be in a better position to evaluate the full range of >>>> performance requirements for those collections.) >>>> >>>> Scheduler.cancel() also takes an alarming amount of time for what looks >>>> like a really simple method ( >>>> >>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29 >>>> ). >>>> >>>> On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <[email protected]> >>>> wrote: >>>> >>>> > Here’s a jprofiler view with the advisory support enabled if you’re >>>> > curious. >>>> > >>>> > http://i.imgur.com/I1jesZz.jpg >>>> > >>>> > I’m not familiar with the internals of ActiveMQ enough to have any >>>> obvious >>>> > optimization ideas. >>>> > >>>> > One other idea I had (which would require a ton of refactoring I >>>> think) >>>> > would be to potentially bulk delete all the queues at once. >>>> > >>>> > >>>> > On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <[email protected]> >>>> wrote: >>>> > >>>> > > And spending some more time in jprofiler, looks like 20% of this is >>>> do to >>>> > > schedulerSupport and the other 80% of this is do to advisorySupport. >>>> > > >>>> > > if I set both to false the total runtime of my tests drops in half… >>>> and >>>> > > the latencies fall from >>>> > > >>>> > > max create producer latency: 10,176 ms >>>> > >> max create message on existing producer and consumer: 2 ms >>>> > > >>>> > > >>>> > > … to >>>> > > >>>> > > max create producer latency: 1 ms >>>> > >> max create message on existing producer and consumer: 1 ms >>>> > > >>>> > > >>>> > > and this isn’t without fixing the purge background lock. >>>> > > >>>> > > So the question now is what the heck is the advisory support doing >>>> that >>>> > > can result in such massive performance overhead. >>>> > > >>>> > > … and I think advisorySupport is enabled by default so that’s >>>> problematic >>>> > > as well. >>>> > > >>>> > > >>>> > > On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton <[email protected]> >>>> > wrote: >>>> > > >>>> > >> OK. Loaded up JProfiler and confirmed that it’s not LevelDB. >>>> > >> >>>> > >> This is a non-persistent broker I’m testing on. >>>> > >> >>>> > >> Looks like it’s spending all it’s time in >>>> CopyOnWriteArrayList.remove >>>> > and >>>> > >> Timer.purge… >>>> > >> >>>> > >> Which is hopeful because this is ALL due to ActiveMQ internals and >>>> in >>>> > >> theory LevelDB should perform well if we improve the performance of >>>> > >> ActiveMQ internals and fix this lock bug. >>>> > >> >>>> > >> Which would rock! >>>> > >> >>>> > >> It should ALSO make queue creation faster. >>>> > >> >>>> > >> >>>> > >> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton <[email protected]> >>>> > wrote: >>>> > >> >>>> > >>> >>>> > >>> >>>> > >>> On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain <[email protected]> >>>> > wrote: >>>> > >>> >>>> > >>>> So if LevelDB cleanup during removeDestination() is the presumed >>>> > >>>> culprit, >>>> > >>>> can we spin off the LevelDB cleanup work into a separate thread >>>> > >>>> (better: a >>>> > >>>> task object to be serviced by a ThreadPool so you can avoid a >>>> fork >>>> > bomb >>>> > >>>> if >>>> > >>>> we remove many destinations at once) so the call to >>>> > removeDestination() >>>> > >>>> can >>>> > >>>> return quickly and LevelDB can do its record-keeping in the >>>> background >>>> > >>>> without blocking message-processing? >>>> > >>>> >>>> > >>> >>>> > >>> Would that be possible? If the delete is pending on ActiveMQ >>>> there is >>>> > a >>>> > >>> race where a producer could re-create it unless the lock is held. >>>> > >>> >>>> > >>> Though I guess if you dispatched to the GC thread WITH the lock >>>> still >>>> > >>> held you would be ok but I think if we use the existing purge >>>> thread >>>> > then >>>> > >>> we’re fine. >>>> > >>> >>>> > >>> OK. I think I’m wrong about LevelDB being the issue. To be fair I >>>> > >>> wasn’t 100% certain before but I should have specified. >>>> > >>> >>>> > >>> Our current production broker is running with persistent=false.. >>>> .and I >>>> > >>> just re-ran the tests without disk persistence and it has the same >>>> > problem. >>>> > >>> >>>> > >>> So the main issue how is why the heck is ActiveMQ taking SO LONG >>>> to GC >>>> > a >>>> > >>> queue. It’s taking about 100ms which is an insane amount of time >>>> > >>> considering this is done all in memory. >>>> > >>> >>>> > >>> Kevin >>>> > >>> >>>> > >>> -- >>>> > >>> >>>> > >>> Founder/CEO Spinn3r.com >>>> > >>> Location: *San Francisco, CA* >>>> > >>> blog: http://burtonator.wordpress.com >>>> > >>> … or check out my Google+ profile >>>> > >>> <https://plus.google.com/102718274791889610666/posts> >>>> > >>> <http://spinn3r.com> >>>> > >>> >>>> > >>> >>>> > >> >>>> > >> >>>> > >> -- >>>> > >> >>>> > >> Founder/CEO Spinn3r.com >>>> > >> Location: *San Francisco, CA* >>>> > >> blog: http://burtonator.wordpress.com >>>> > >> … or check out my Google+ profile >>>> > >> <https://plus.google.com/102718274791889610666/posts> >>>> > >> <http://spinn3r.com> >>>> > >> >>>> > >> >>>> > > >>>> > > >>>> > > -- >>>> > > >>>> > > Founder/CEO Spinn3r.com >>>> > > Location: *San Francisco, CA* >>>> > > blog: http://burtonator.wordpress.com >>>> > > … or check out my Google+ profile >>>> > > <https://plus.google.com/102718274791889610666/posts> >>>> > > <http://spinn3r.com> >>>> > > >>>> > > >>>> > >>>> > >>>> > -- >>>> > >>>> > Founder/CEO Spinn3r.com >>>> > Location: *San Francisco, CA* >>>> > blog: http://burtonator.wordpress.com >>>> > … or check out my Google+ profile >>>> > <https://plus.google.com/102718274791889610666/posts> >>>> > <http://spinn3r.com> >>>> > >>>> >>> >>> >>> >>> -- >>> >>> Founder/CEO Spinn3r.com >>> Location: *San Francisco, CA* >>> blog: http://burtonator.wordpress.com >>> … or check out my Google+ profile >>> <https://plus.google.com/102718274791889610666/posts> >>> <http://spinn3r.com> >>> >>> >> >> >> -- >> >> Founder/CEO Spinn3r.com >> Location: *San Francisco, CA* >> blog: http://burtonator.wordpress.com >> … or check out my Google+ profile >> <https://plus.google.com/102718274791889610666/posts> >> <http://spinn3r.com> >> >> > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>
