How long do the tests usually take? I’m looking at 45 minutes right now before I gave up… I think part of it was that the box I was testing on was virtualized and didn’t have enough resources.
I tried to parallelize the tests (-T 8 with maven) but I got other errors so I assume the ports are singletons. On Tue, Feb 24, 2015 at 8:03 AM, Gary Tully <gary.tu...@gmail.com> wrote: > if there are any test failures - try to run them individually > -Dtest=a,b etc. There may be an issue with a full test run, but all of > the tests that are enabled should work. I know there are some issues > with jdbc tests that hang or fail due to previous runs no cleaning up, > but that should be the most of it. I got a bunch of full test runs > before the 5.11 release if that is any help. > > On 23 February 2015 at 20:38, Kevin Burton <bur...@spinn3r.com> wrote: > > OK. This is ready to go and I have a patch branch: > > > > https://issues.apache.org/jira/browse/AMQ-5609 > > > > I’m stuck at the moment though because tests don’t pass. But it was > > failing tests before so I don’t think it has anything to do with my > > changes. > > > > > > > > On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <bur...@spinn3r.com> > wrote: > > > >> Actually, is the lock even needed here? Why would it be? if we’re > >> *removing* a subscription, why does it care if we possibly ALSO remove a > >> separate / isolated queue before/after the subscription is removed. > >> > >> I think this is redundant and can be removed. Maybe I’m wrong though. > >> > >> I looked at all the callers and none were associated with queues. > >> > >> On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <bur...@spinn3r.com> > wrote: > >> > >>> So I have some working/theoretical code that should resolve this. > >>> > >>> It acquires a lock *per* ActiveMQDestination, this way there is no lock > >>> contention. > >>> > >>> But here’s where I’m stuck. > >>> > >>> @Override > >>>> public void removeSubscription(ConnectionContext context, > >>>> RemoveSubscriptionInfo info) throws Exception { > >>>> inactiveDestinationsPurgeLock.readLock().lock(); > >>>> try { > >>>> topicRegion.removeSubscription(context, info); > >>>> } finally { > >>>> inactiveDestinationsPurgeLock.readLock().unlock(); > >>>> } > >>>> } > >>> > >>> > >>> .. this is in RegionBroker. > >>> > >>> There is no ActiveMQDestination involved here so I’m not sure the best > >>> way to resolve this. > >>> > >>> Any advice? > >>> > >>> > >>> On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <bur...@spinn3r.com> > wrote: > >>> > >>>> Yes. That was my thinking too.. that just replacing the > CopyOnWriteArraySet > >>>> with something more performance would solve the issue. > >>>> > >>>> This would also improve queue creation time as well as queue deletion > >>>> time. > >>>> > >>>> What I think I”m going to do in the mean time is: > >>>> > >>>> - implement a granular lock based on queue name… I am going to use an > >>>> interface so we can replace the implementation later. > >>>> > >>>> - implement timing for the purge thread so it tracks how long it takes > >>>> to remove a queue but also how long the entire loop takes. > >>>> > >>>> I’ll do this on a branch so it should be easy to merge. > >>>> > >>>> On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <tb...@alumni.duke.edu> > wrote: > >>>> > >>>>> A decent amount of the time is being spent calling remove() on > various > >>>>> array-backed collections. Those data structures might be > inappropriate > >>>>> for > >>>>> the number of destinations you're running, since array-backed > >>>>> collections > >>>>> tend to have add/remove operations that are O(N); some improvement > might > >>>>> come from something as simple as moving to a ConcurrentHashSet > instead > >>>>> of a > >>>>> CopyOnWriteArraySet, for example. (Or it might make performance > worse > >>>>> because of other aspects of how those collections are used; people > other > >>>>> than me would be in a better position to evaluate the full range of > >>>>> performance requirements for those collections.) > >>>>> > >>>>> Scheduler.cancel() also takes an alarming amount of time for what > looks > >>>>> like a really simple method ( > >>>>> > >>>>> > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29 > >>>>> ). > >>>>> > >>>>> On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <bur...@spinn3r.com> > >>>>> wrote: > >>>>> > >>>>> > Here’s a jprofiler view with the advisory support enabled if you’re > >>>>> > curious. > >>>>> > > >>>>> > http://i.imgur.com/I1jesZz.jpg > >>>>> > > >>>>> > I’m not familiar with the internals of ActiveMQ enough to have any > >>>>> obvious > >>>>> > optimization ideas. > >>>>> > > >>>>> > One other idea I had (which would require a ton of refactoring I > >>>>> think) > >>>>> > would be to potentially bulk delete all the queues at once. > >>>>> > > >>>>> > > >>>>> > On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton <bur...@spinn3r.com> > >>>>> wrote: > >>>>> > > >>>>> > > And spending some more time in jprofiler, looks like 20% of this > is > >>>>> do to > >>>>> > > schedulerSupport and the other 80% of this is do to > advisorySupport. > >>>>> > > > >>>>> > > if I set both to false the total runtime of my tests drops in > half… > >>>>> and > >>>>> > > the latencies fall from > >>>>> > > > >>>>> > > max create producer latency: 10,176 ms > >>>>> > >> max create message on existing producer and consumer: 2 ms > >>>>> > > > >>>>> > > > >>>>> > > … to > >>>>> > > > >>>>> > > max create producer latency: 1 ms > >>>>> > >> max create message on existing producer and consumer: 1 ms > >>>>> > > > >>>>> > > > >>>>> > > and this isn’t without fixing the purge background lock. > >>>>> > > > >>>>> > > So the question now is what the heck is the advisory support > doing > >>>>> that > >>>>> > > can result in such massive performance overhead. > >>>>> > > > >>>>> > > … and I think advisorySupport is enabled by default so that’s > >>>>> problematic > >>>>> > > as well. > >>>>> > > > >>>>> > > > >>>>> > > On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton < > bur...@spinn3r.com> > >>>>> > wrote: > >>>>> > > > >>>>> > >> OK. Loaded up JProfiler and confirmed that it’s not LevelDB. > >>>>> > >> > >>>>> > >> This is a non-persistent broker I’m testing on. > >>>>> > >> > >>>>> > >> Looks like it’s spending all it’s time in > >>>>> CopyOnWriteArrayList.remove > >>>>> > and > >>>>> > >> Timer.purge… > >>>>> > >> > >>>>> > >> Which is hopeful because this is ALL due to ActiveMQ internals > and > >>>>> in > >>>>> > >> theory LevelDB should perform well if we improve the > performance of > >>>>> > >> ActiveMQ internals and fix this lock bug. > >>>>> > >> > >>>>> > >> Which would rock! > >>>>> > >> > >>>>> > >> It should ALSO make queue creation faster. > >>>>> > >> > >>>>> > >> > >>>>> > >> On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton < > bur...@spinn3r.com> > >>>>> > wrote: > >>>>> > >> > >>>>> > >>> > >>>>> > >>> > >>>>> > >>> On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain < > tb...@alumni.duke.edu> > >>>>> > wrote: > >>>>> > >>> > >>>>> > >>>> So if LevelDB cleanup during removeDestination() is the > presumed > >>>>> > >>>> culprit, > >>>>> > >>>> can we spin off the LevelDB cleanup work into a separate > thread > >>>>> > >>>> (better: a > >>>>> > >>>> task object to be serviced by a ThreadPool so you can avoid a > >>>>> fork > >>>>> > bomb > >>>>> > >>>> if > >>>>> > >>>> we remove many destinations at once) so the call to > >>>>> > removeDestination() > >>>>> > >>>> can > >>>>> > >>>> return quickly and LevelDB can do its record-keeping in the > >>>>> background > >>>>> > >>>> without blocking message-processing? > >>>>> > >>>> > >>>>> > >>> > >>>>> > >>> Would that be possible? If the delete is pending on ActiveMQ > >>>>> there is > >>>>> > a > >>>>> > >>> race where a producer could re-create it unless the lock is > held. > >>>>> > >>> > >>>>> > >>> Though I guess if you dispatched to the GC thread WITH the lock > >>>>> still > >>>>> > >>> held you would be ok but I think if we use the existing purge > >>>>> thread > >>>>> > then > >>>>> > >>> we’re fine. > >>>>> > >>> > >>>>> > >>> OK. I think I’m wrong about LevelDB being the issue. To be > fair I > >>>>> > >>> wasn’t 100% certain before but I should have specified. > >>>>> > >>> > >>>>> > >>> Our current production broker is running with > persistent=false.. > >>>>> .and I > >>>>> > >>> just re-ran the tests without disk persistence and it has the > same > >>>>> > problem. > >>>>> > >>> > >>>>> > >>> So the main issue how is why the heck is ActiveMQ taking SO > LONG > >>>>> to GC > >>>>> > a > >>>>> > >>> queue. It’s taking about 100ms which is an insane amount of > time > >>>>> > >>> considering this is done all in memory. > >>>>> > >>> > >>>>> > >>> Kevin > >>>>> > >>> > >>>>> > >>> -- > >>>>> > >>> > >>>>> > >>> Founder/CEO Spinn3r.com > >>>>> > >>> Location: *San Francisco, CA* > >>>>> > >>> blog: http://burtonator.wordpress.com > >>>>> > >>> … or check out my Google+ profile > >>>>> > >>> <https://plus.google.com/102718274791889610666/posts> > >>>>> > >>> <http://spinn3r.com> > >>>>> > >>> > >>>>> > >>> > >>>>> > >> > >>>>> > >> > >>>>> > >> -- > >>>>> > >> > >>>>> > >> Founder/CEO Spinn3r.com > >>>>> > >> Location: *San Francisco, CA* > >>>>> > >> blog: http://burtonator.wordpress.com > >>>>> > >> … or check out my Google+ profile > >>>>> > >> <https://plus.google.com/102718274791889610666/posts> > >>>>> > >> <http://spinn3r.com> > >>>>> > >> > >>>>> > >> > >>>>> > > > >>>>> > > > >>>>> > > -- > >>>>> > > > >>>>> > > Founder/CEO Spinn3r.com > >>>>> > > Location: *San Francisco, CA* > >>>>> > > blog: http://burtonator.wordpress.com > >>>>> > > … or check out my Google+ profile > >>>>> > > <https://plus.google.com/102718274791889610666/posts> > >>>>> > > <http://spinn3r.com> > >>>>> > > > >>>>> > > > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > > >>>>> > Founder/CEO Spinn3r.com > >>>>> > Location: *San Francisco, CA* > >>>>> > blog: http://burtonator.wordpress.com > >>>>> > … or check out my Google+ profile > >>>>> > <https://plus.google.com/102718274791889610666/posts> > >>>>> > <http://spinn3r.com> > >>>>> > > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> Founder/CEO Spinn3r.com > >>>> Location: *San Francisco, CA* > >>>> blog: http://burtonator.wordpress.com > >>>> … or check out my Google+ profile > >>>> <https://plus.google.com/102718274791889610666/posts> > >>>> <http://spinn3r.com> > >>>> > >>>> > >>> > >>> > >>> -- > >>> > >>> Founder/CEO Spinn3r.com > >>> Location: *San Francisco, CA* > >>> blog: http://burtonator.wordpress.com > >>> … or check out my Google+ profile > >>> <https://plus.google.com/102718274791889610666/posts> > >>> <http://spinn3r.com> > >>> > >>> > >> > >> > >> -- > >> > >> Founder/CEO Spinn3r.com > >> Location: *San Francisco, CA* > >> blog: http://burtonator.wordpress.com > >> … or check out my Google+ profile > >> <https://plus.google.com/102718274791889610666/posts> > >> <http://spinn3r.com> > >> > >> > > > > > > -- > > > > Founder/CEO Spinn3r.com > > Location: *San Francisco, CA* > > blog: http://burtonator.wordpress.com > > … or check out my Google+ profile > > <https://plus.google.com/102718274791889610666/posts> > > <http://spinn3r.com> > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>