Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.

Kevin Burton Mon, 23 Feb 2015 12:39:52 -0800

OK.  This is ready to go and I have a patch branch:

https://issues.apache.org/jira/browse/AMQ-5609


I’m stuck at the moment though because tests don’t pass.  But it was
failing tests before so I don’t think it has anything to do with my
changes.



On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <[email protected]> wrote:

> Actually, is the lock even needed here?  Why would it be?  if we’re
> *removing* a subscription, why does it care if we possibly ALSO remove a
> separate / isolated queue before/after the subscription is removed.
>
> I think this is redundant and can be removed.  Maybe I’m wrong though.
>
> I looked at all the callers and none were associated with queues.
>
> On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <[email protected]> wrote:
>
>> So I have some working/theoretical code that should resolve this.
>>
>> It acquires a lock *per* ActiveMQDestination, this way there is no lock
>> contention.
>>
>> But here’s where I’m stuck.
>>
>>     @Override
>>>     public void removeSubscription(ConnectionContext context,
>>> RemoveSubscriptionInfo info) throws Exception {
>>>         inactiveDestinationsPurgeLock.readLock().lock();
>>>         try {
>>>             topicRegion.removeSubscription(context, info);
>>>         } finally {
>>>             inactiveDestinationsPurgeLock.readLock().unlock();
>>>         }
>>>     }
>>
>>
>> .. this is in RegionBroker.
>>
>> There is no ActiveMQDestination involved here so I’m not sure the best
>> way to resolve this.
>>
>> Any advice?
>>
>>
>> On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <[email protected]> wrote:
>>
>>> Yes.  That was my thinking too.. that just replacing the CopyOnWriteArraySet
>>> with something more performance would solve the issue.
>>>
>>> This would also improve queue creation time as well as queue deletion
>>> time.
>>>
>>> What I think I”m going to do in the mean time is:
>>>
>>> - implement a granular lock based on queue name… I am going to use an
>>> interface so we can replace the implementation later.
>>>
>>> - implement timing for the purge thread so it tracks how long it takes
>>> to remove a queue but also how long the entire loop takes.
>>>
>>> I’ll do this on a branch so it should be easy to merge.
>>>
>>> On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <[email protected]> wrote:
>>>
>>>> A decent amount of the time is being spent calling remove() on various
>>>> array-backed collections.  Those data structures might be inappropriate
>>>> for
>>>> the number of destinations you're running, since array-backed
>>>> collections
>>>> tend to have add/remove operations that are O(N); some improvement might
>>>> come from something as simple as moving to a ConcurrentHashSet instead
>>>> of a
>>>> CopyOnWriteArraySet, for example.  (Or it might make performance worse
>>>> because of other aspects of how those collections are used; people other
>>>> than me would be in a better position to evaluate the full range of
>>>> performance requirements for those collections.)
>>>>
>>>> Scheduler.cancel() also takes an alarming amount of time for what looks
>>>> like a really simple method (
>>>>
>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29
>>>> ).
>>>>
>>>> On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <[email protected]>
>>>> wrote:
>>>>
>>>> > Here’s a jprofiler view with the advisory support enabled if you’re
>>>> > curious.
>>>> >
>>>> > http://i.imgur.com/I1jesZz.jpg
>>>> >
>>>> > I’m not familiar with the internals of ActiveMQ enough to have any
>>>> obvious
>>>> > optimization ideas.
>>>> >
>>>> > One other idea I had (which would require a ton of refactoring I
>>>> think)
>>>> > would be to potentially bulk delete all the queues at once.
>>>> >
>>>> >
>>>> > On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <[email protected]>
>>>> wrote:
>>>> >
>>>> > > And spending some more time in jprofiler, looks like 20% of this is
>>>> do to
>>>> > > schedulerSupport and the other 80% of this is do to advisorySupport.
>>>> > >
>>>> > > if I set both to false the total runtime of my tests drops in half…
>>>> and
>>>> > > the latencies fall from
>>>> > >
>>>> > > max create producer latency: 10,176 ms
>>>> > >> max create message on existing producer and consumer: 2 ms
>>>> > >
>>>> > >
>>>> > > … to
>>>> > >
>>>> > > max create producer latency: 1 ms
>>>> > >> max create message on existing producer and consumer: 1 ms
>>>> > >
>>>> > >
>>>> > > and this isn’t without fixing the purge background lock.
>>>> > >
>>>> > > So the question now is what the heck is the advisory support doing
>>>> that
>>>> > > can result in such massive performance overhead.
>>>> > >
>>>> > > … and I think advisorySupport is enabled by default so that’s
>>>> problematic
>>>> > > as well.
>>>> > >
>>>> > >
>>>> > > On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton <[email protected]>
>>>> > wrote:
>>>> > >
>>>> > >> OK.  Loaded up JProfiler and confirmed that it’s not LevelDB.
>>>> > >>
>>>> > >> This is a non-persistent broker I’m testing on.
>>>> > >>
>>>> > >> Looks like it’s spending all it’s time in
>>>> CopyOnWriteArrayList.remove
>>>> > and
>>>> > >> Timer.purge…
>>>> > >>
>>>> > >> Which is hopeful because this is ALL due to ActiveMQ internals and
>>>> in
>>>> > >> theory LevelDB should perform well if we improve the performance of
>>>> > >> ActiveMQ internals and fix this lock bug.
>>>> > >>
>>>> > >> Which would rock!
>>>> > >>
>>>> > >> It should ALSO make queue creation faster.
>>>> > >>
>>>> > >>
>>>> > >> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton <[email protected]>
>>>> > wrote:
>>>> > >>
>>>> > >>>
>>>> > >>>
>>>> > >>> On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain <[email protected]>
>>>> > wrote:
>>>> > >>>
>>>> > >>>> So if LevelDB cleanup during removeDestination() is the presumed
>>>> > >>>> culprit,
>>>> > >>>> can we spin off the LevelDB cleanup work into a separate thread
>>>> > >>>> (better: a
>>>> > >>>> task object to be serviced by a ThreadPool so you can avoid a
>>>> fork
>>>> > bomb
>>>> > >>>> if
>>>> > >>>> we remove many destinations at once) so the call to
>>>> > removeDestination()
>>>> > >>>> can
>>>> > >>>> return quickly and LevelDB can do its record-keeping in the
>>>> background
>>>> > >>>> without blocking message-processing?
>>>> > >>>>
>>>> > >>>
>>>> > >>> Would that be possible?  If the delete is pending on ActiveMQ
>>>> there is
>>>> > a
>>>> > >>> race where a producer could re-create it unless the lock is held.
>>>> > >>>
>>>> > >>> Though I guess if you dispatched to the GC thread WITH the lock
>>>> still
>>>> > >>> held you would be ok but I think if we use the existing purge
>>>> thread
>>>> > then
>>>> > >>> we’re fine.
>>>> > >>>
>>>> > >>> OK. I think I’m wrong about LevelDB being the issue.  To be fair I
>>>> > >>> wasn’t 100% certain before but I should have specified.
>>>> > >>>
>>>> > >>> Our current production broker is running with persistent=false..
>>>> .and I
>>>> > >>> just re-ran the tests without disk persistence and it has the same
>>>> > problem.
>>>> > >>>
>>>> > >>> So the main issue how is why the heck is ActiveMQ taking SO LONG
>>>> to GC
>>>> > a
>>>> > >>> queue.  It’s taking about 100ms which is an insane amount of time
>>>> > >>> considering this is done all in memory.
>>>> > >>>
>>>> > >>> Kevin
>>>> > >>>
>>>> > >>> --
>>>> > >>>
>>>> > >>> Founder/CEO Spinn3r.com
>>>> > >>> Location: *San Francisco, CA*
>>>> > >>> blog: http://burtonator.wordpress.com
>>>> > >>> … or check out my Google+ profile
>>>> > >>> <https://plus.google.com/102718274791889610666/posts>
>>>> > >>> <http://spinn3r.com>
>>>> > >>>
>>>> > >>>
>>>> > >>
>>>> > >>
>>>> > >> --
>>>> > >>
>>>> > >> Founder/CEO Spinn3r.com
>>>> > >> Location: *San Francisco, CA*
>>>> > >> blog: http://burtonator.wordpress.com
>>>> > >> … or check out my Google+ profile
>>>> > >> <https://plus.google.com/102718274791889610666/posts>
>>>> > >> <http://spinn3r.com>
>>>> > >>
>>>> > >>
>>>> > >
>>>> > >
>>>> > > --
>>>> > >
>>>> > > Founder/CEO Spinn3r.com
>>>> > > Location: *San Francisco, CA*
>>>> > > blog: http://burtonator.wordpress.com
>>>> > > … or check out my Google+ profile
>>>> > > <https://plus.google.com/102718274791889610666/posts>
>>>> > > <http://spinn3r.com>
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > Founder/CEO Spinn3r.com
>>>> > Location: *San Francisco, CA*
>>>> > blog: http://burtonator.wordpress.com
>>>> > … or check out my Google+ profile
>>>> > <https://plus.google.com/102718274791889610666/posts>
>>>> > <http://spinn3r.com>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.

Reply via email to