Kevin, Go for it.
Yes, out tests need a lot of love. Actually most of the tests are not really unit tests, but more like integration tests. Helping with the spring (as in season) cleanup would be immensely appreciated.
Hadrian On 02/24/2015 02:18 PM, Kevin Burton wrote:
Ah. OK… I assume continuous integration isn’t a goal. I’m upset if my testing takes more than 20 minutes :-P I bet that this would be taken down to an hour if they were parallelized. But of course someone needs to take the time to get that done. I might be allocating some of our internal dev resources to getting our build to parallelize across containers. If this generalizable to other projects I’ll OSS it.. Would be great to get better testing feedback. Right now I think my patches are ready to be merged but I’d feel more comfortable suggesting it if I knew the tests worked. Should I propose it now and if there are any bugs we can fix them later? Kevin On Tue, Feb 24, 2015 at 10:54 AM, Timothy Bish <tabish...@gmail.com> wrote:On 02/24/2015 01:22 PM, Kevin Burton wrote:oh geez… seriously though. can you give me an estimate? Assuming 30 seconds each I’m assuming about 11 hours.That's probably not to far off, usually just let them run over night for the all profile.http://activemq.apache.org/junit-reports.html https://hudson.apache.org/hudson/job/ActiveMQ/ Hudson is down for ActiveMQ. Also, is there a disk space requirement for the tests? I have 10GB free or so and I’m getting TONS of these:I don't think anyone has looked into the min required free for the tests to run.java.lang.RuntimeException: Failed to start provided job scheduler store: JobSchedulerStore:activemq-data at java.io.RandomAccessFile.seek(Native Method) at org.apache.activemq.util.RecoverableRandomAccessFile.seek( RecoverableRandomAccessFile.java:384) at org.apache.activemq.store.kahadb.disk.page.PageFile. readPage(PageFile.java:877) at org.apache.activemq.store.kahadb.disk.page.Transaction. load(Transaction.java:427) at org.apache.activemq.store.kahadb.disk.page.Transaction. load(Transaction.java:377) at org.apache.activemq.store.kahadb.disk.index.BTreeIndex. load(BTreeIndex.java:159) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl$ MetaData.load(JobSchedulerStoreImpl.java:90) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl$3. execute(JobSchedulerStoreImpl.java:277) at org.apache.activemq.store.kahadb.disk.page.Transaction. execute(Transaction.java:779) at org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.doStart( JobSchedulerStoreImpl.java:261) at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55) at org.apache.activemq.broker.BrokerService.setJobSchedulerStore( BrokerService.java:1891) at org.apache.activemq.transport.stomp.StompTestSupport. createBroker(StompTestSupport.java:174) at org.apache.activemq.transport.stomp.StompTestSupport. startBroker(StompTestSupport.java:112) at org.apache.activemq.transport.stomp.StompTestSupport.setUp( StompTestSupport.java:94) at org.apache.activemq.transport.stomp.Stomp12Test.setUp( Stomp12Test.java:41) at org.apache.activemq.transport.stomp.Stomp12SslAuthTest. setUp(Stomp12SslAuthTest.java:38) On Tue, Feb 24, 2015 at 10:09 AM, Timothy Bish <tabish...@gmail.com> wrote: On 02/24/2015 01:01 PM, Kevin Burton wrote:How long do the tests usually take?An eternity + 1hrI’m looking at 45 minutes right nowbefore I gave up… I think part of it was that the box I was testing on was virtualized and didn’t have enough resources. I tried to parallelize the tests (-T 8 with maven) but I got other errors so I assume the ports are singletons. They won't be happy if you run them parallel.On Tue, Feb 24, 2015 at 8:03 AM, Gary Tully <gary.tu...@gmail.com>wrote: if there are any test failures - try to run them individually-Dtest=a,b etc. There may be an issue with a full test run, but all of the tests that are enabled should work. I know there are some issues with jdbc tests that hang or fail due to previous runs no cleaning up, but that should be the most of it. I got a bunch of full test runs before the 5.11 release if that is any help. On 23 February 2015 at 20:38, Kevin Burton <bur...@spinn3r.com> wrote: OK. This is ready to go and I have a patch branch:https://issues.apache.org/jira/browse/AMQ-5609 I’m stuck at the moment though because tests don’t pass. But it was failing tests before so I don’t think it has anything to do with my changes. On Sun, Feb 22, 2015 at 11:11 PM, Kevin Burton <bur...@spinn3r.com> wrote:Actually, is the lock even needed here? Why would it be? if we’re*removing* a subscription, why does it care if we possibly ALSO remove a separate / isolated queue before/after the subscription is removed. I think this is redundant and can be removed. Maybe I’m wrong though. I looked at all the callers and none were associated with queues. On Sun, Feb 22, 2015 at 11:07 PM, Kevin Burton <bur...@spinn3r.com> wrote:So I have some working/theoretical code that should resolve this.It acquires a lock *per* ActiveMQDestination, this way there is nolock contention. But here’s where I’m stuck. @Override public void removeSubscription(ConnectionContext context,RemoveSubscriptionInfo info) throws Exception { inactiveDestinationsPurgeLock.readLock().lock(); try { topicRegion.removeSubscription(context, info); } finally { inactiveDestinationsPurgeLock.readLock().unlock(); } } .. this is in RegionBroker.There is no ActiveMQDestination involved here so I’m not sure the best way to resolve this. Any advice? On Sun, Feb 22, 2015 at 8:11 PM, Kevin Burton <bur...@spinn3r.com> wrote:Yes. That was my thinking too.. that just replacing theCopyOnWriteArraySet with something more performance would solve the issue. This would also improve queue creation time as well as queue deletiontime. What I think I”m going to do in the mean time is: - implement a granular lock based on queue name… I am going to use an interface so we can replace the implementation later. - implement timing for the purge thread so it tracks how long it takes to remove a queue but also how long the entire loop takes. I’ll do this on a branch so it should be easy to merge. On Sun, Feb 22, 2015 at 7:40 PM, Tim Bain <tb...@alumni.duke.edu> wrote:A decent amount of the time is being spent calling remove() on variousarray-backed collections. Those data structures might beinappropriateforthe number of destinations you're running, since array-backedcollections tend to have add/remove operations that are O(N); some improvement mightcome from something as simple as moving to a ConcurrentHashSetinsteadof aCopyOnWriteArraySet, for example. (Or it might make performanceworsebecause of other aspects of how those collections are used; peopleotherthan me would be in a better position to evaluate the full range ofperformance requirements for those collections.)Scheduler.cancel() also takes an alarming amount of time for what lookslike a really simple method (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/thread/Scheduler.java#Scheduler.cancel%28java.lang.Runnable%29 ).On Sun, Feb 22, 2015 at 7:56 PM, Kevin Burton <bur...@spinn3r.com>wrote: Here’s a jprofiler view with the advisory support enabled if you’recurious. http://i.imgur.com/I1jesZz.jpg I’m not familiar with the internals of ActiveMQ enough to have any obviousoptimization ideas.One other idea I had (which would require a ton of refactoring I think)would be to potentially bulk delete all the queues at once.On Sun, Feb 22, 2015 at 6:42 PM, Kevin Burton < bur...@spinn3r.com> wrote:And spending some more time in jprofiler, looks like 20% of thisisdo toschedulerSupport and the other 80% of this is do toadvisorySupport.if I set both to false the total runtime of my tests drops inhalf…andthe latencies fall frommax create producer latency: 10,176 ms max create message on existing producer and consumer: 2 ms… tomax create producer latency: 1 ms max create message on existing producer and consumer: 1 msand this isn’t without fixing the purge background lock.So the question now is what the heck is the advisory support doingthatcan result in such massive performance overhead.… and I think advisorySupport is enabled by default so that’s problematicas well.On Sun, Feb 22, 2015 at 4:45 PM, Kevin Burton < bur...@spinn3r.com>wrote:OK. Loaded up JProfiler and confirmed that it’s not LevelDB.This is a non-persistent broker I’m testing on. Looks like it’s spending all it’s time in CopyOnWriteArrayList.removeand Timer.purge…Which is hopeful because this is ALL due to ActiveMQ internals andintheory LevelDB should perform well if we improve theperformance ofActiveMQ internals and fix this lock bug.Which would rock!It should ALSO make queue creation faster. On Sun, Feb 22, 2015 at 4:10 PM, Kevin Burton < bur...@spinn3r.com>wrote:On Sun, Feb 22, 2015 at 3:30 PM, Tim Bain <tb...@alumni.duke.edu>wrote:So if LevelDB cleanup during removeDestination() is thepresumed culprit,can we spin off the LevelDB cleanup work into a separatethread(better: atask object to be serviced by a ThreadPool so you can avoid aforkbombifwe remove many destinations at once) so the call toremoveDestination()can return quickly and LevelDB can do its record-keeping in thebackgroundwithout blocking message-processing?Would that be possible? If the delete is pending on ActiveMQthere isarace where a producer could re-create it unless the lock isheld.Though I guess if you dispatched to the GC thread WITH the lockstillheld you would be ok but I think if we use the existing purge thread thenwe’re fine.OK. I think I’m wrong about LevelDB being the issue. To befair Iwasn’t 100% certain before but I should have specified.Our current production broker is running withpersistent=false...and Ijust re-ran the tests without disk persistence and it has thesame problem.So the main issue how is why the heck is ActiveMQ taking SOLONGto GCaqueue. It’s taking about 100ms which is an insane amount oftimeconsidering this is done all in memory.Kevin-- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> --Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>--Tim Bish Sr Software Engineer | RedHat Inc. tim.b...@redhat.com | www.redhat.com skype: tabish121 | twitter: @tabish121 blog: http://timbish.blogspot.com/-- Tim Bish Sr Software Engineer | RedHat Inc. tim.b...@redhat.com | www.redhat.com skype: tabish121 | twitter: @tabish121 blog: http://timbish.blogspot.com/