Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.

Tim Bain Sun, 22 Feb 2015 19:42:21 -0800

A decent amount of the time is being spent calling remove() on various
array-backed collections.  Those data structures might be inappropriate for
the number of destinations you're running, since array-backed collections
tend to have add/remove operations that are O(N); some improvement might
come from something as simple as moving to a ConcurrentHashSet instead of a
CopyOnWriteArraySet, for example.  (Or it might make performance worse
because of other aspects of how those collections are used; people other
than me would be in a better position to evaluate the full range of
performance requirements for those collections.)


Scheduler.cancel() also takes an alarming amount of time for what looks
like a really simple method (
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29
).

On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> Here’s a jprofiler view with the advisory support enabled if you’re
> curious.
>
> http://i.imgur.com/I1jesZz.jpg
>
> I’m not familiar with the internals of ActiveMQ enough to have any obvious
> optimization ideas.
>
> One other idea I had (which would require a ton of refactoring I think)
> would be to potentially bulk delete all the queues at once.
>
>
> On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <bur...@spinn3r.com> wrote:
>
> > And spending some more time in jprofiler, looks like 20% of this is do to
> > schedulerSupport and the other 80% of this is do to advisorySupport.
> >
> > if I set both to false the total runtime of my tests drops in half… and
> > the latencies fall from
> >
> > max create producer latency: 10,176 ms
> >> max create message on existing producer and consumer: 2 ms
> >
> >
> > … to
> >
> > max create producer latency: 1 ms
> >> max create message on existing producer and consumer: 1 ms
> >
> >
> > and this isn’t without fixing the purge background lock.
> >
> > So the question now is what the heck is the advisory support doing that
> > can result in such massive performance overhead.
> >
> > … and I think advisorySupport is enabled by default so that’s problematic
> > as well.
> >
> >
> > On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton <bur...@spinn3r.com>
> wrote:
> >
> >> OK.  Loaded up JProfiler and confirmed that it’s not LevelDB.
> >>
> >> This is a non-persistent broker I’m testing on.
> >>
> >> Looks like it’s spending all it’s time in CopyOnWriteArrayList.remove
> and
> >> Timer.purge…
> >>
> >> Which is hopeful because this is ALL due to ActiveMQ internals and in
> >> theory LevelDB should perform well if we improve the performance of
> >> ActiveMQ internals and fix this lock bug.
> >>
> >> Which would rock!
> >>
> >> It should ALSO make queue creation faster.
> >>
> >>
> >> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton <bur...@spinn3r.com>
> wrote:
> >>
> >>>
> >>>
> >>> On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain <tb...@alumni.duke.edu>
> wrote:
> >>>
> >>>> So if LevelDB cleanup during removeDestination() is the presumed
> >>>> culprit,
> >>>> can we spin off the LevelDB cleanup work into a separate thread
> >>>> (better: a
> >>>> task object to be serviced by a ThreadPool so you can avoid a fork
> bomb
> >>>> if
> >>>> we remove many destinations at once) so the call to
> removeDestination()
> >>>> can
> >>>> return quickly and LevelDB can do its record-keeping in the background
> >>>> without blocking message-processing?
> >>>>
> >>>
> >>> Would that be possible?  If the delete is pending on ActiveMQ there is
> a
> >>> race where a producer could re-create it unless the lock is held.
> >>>
> >>> Though I guess if you dispatched to the GC thread WITH the lock still
> >>> held you would be ok but I think if we use the existing purge thread
> then
> >>> we’re fine.
> >>>
> >>> OK. I think I’m wrong about LevelDB being the issue.  To be fair I
> >>> wasn’t 100% certain before but I should have specified.
> >>>
> >>> Our current production broker is running with persistent=false.. .and I
> >>> just re-ran the tests without disk persistence and it has the same
> problem.
> >>>
> >>> So the main issue how is why the heck is ActiveMQ taking SO LONG to GC
> a
> >>> queue.  It’s taking about 100ms which is an insane amount of time
> >>> considering this is done all in memory.
> >>>
> >>> Kevin
> >>>
> >>> --
> >>>
> >>> Founder/CEO Spinn3r.com
> >>> Location: *San Francisco, CA*
> >>> blog: http://burtonator.wordpress.com
> >>> … or check out my Google+ profile
> >>> <https://plus.google.com/102718274791889610666/posts>
> >>> <http://spinn3r.com>
> >>>
> >>>
> >>
> >>
> >> --
> >>
> >> Founder/CEO Spinn3r.com
> >> Location: *San Francisco, CA*
> >> blog: http://burtonator.wordpress.com
> >> … or check out my Google+ profile
> >> <https://plus.google.com/102718274791889610666/posts>
> >> <http://spinn3r.com>
> >>
> >>
> >
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+ profile
> > <https://plus.google.com/102718274791889610666/posts>
> > <http://spinn3r.com>
> >
> >
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>

Re: A proposal to rewrite purgeInactiveDestinations locking to prevent queue GC lockups.

Reply via email to