Keep in mind that a pause as described by that JIRA could come about because your consumer has a full prefetch buffer worth of messages that match the selector plus lots more messages in the store. If you have a backlog for any consumer, anything that can't fit in the consumer's prefetch buffer will hang out in the cursor and eventually the message store (outside, and blocked by, the cursor). It's not necessary to have messages that fail to match any selector, though that will certainly produce the behavior too.
Tim On Fri, Apr 24, 2015 at 3:21 PM, Kevin Burton <bur...@spinn3r.com> wrote: > Literally JUST found this issue! > > Is this documented anywhere? My issue is that there *is* no sparse message > distribution. Every message has a value from between 0 and 9 with none > lacking that header. > > I even consume where the message is lacking the value. > > So there shouldn’t be anything left over. > > I think ActiveMQ should probably log an error when this happens. > > On Fri, Apr 24, 2015 at 2:03 PM, Timothy Bish <tabish...@gmail.com> wrote: > > > On 04/24/2015 04:50 PM, Kevin Burton wrote: > > > I’ve been working 15 hour days for the last 2-3 weeks trying to resolve > > > this so if this is somewhat incoherent it’s probably due to lack of > sleep > > > :-P > > > > > > I think we’re experiencing a bug in ActiveMQ which is VERY hard to > > > reproduce but happens regularly in our production setup. > > > > > > I can’t reproduce it in my test setup because it seems to require real > > > world data. Every time I try to do so everything works fine. > > > > > > It seems you have to have the following: > > > > > > - a large number of queues which need servicing ( > 1000) > > > - a fairly large number of connections (>2000) > > > - message selectors > > > - a queue that has a large number of messages (5000). > > > > > > I have my test code now reproducing it… > > > > > > Everything works FINE if we have just a few message. The problems > arise > > > once the queue size grows at which point selectors don’t work. > > > > > > It seems like *early* connections win. If I create a connection to > > > ActiveMQ early, and keep it open, it will work. But new connections > don’t > > > work.. Eventually, the existing connections will fail too. > > > > > > Basically, it works JUST FINE without message selectors. > > > > > > I KNOW it’s not my code because I’ve written a basic /simple consumer > > which > > > is literally just raw JMS and is < 50 lines of code. > > > > > > I also know my messages selectors should match. First. they do match > > some > > > percentage of the time. Second, when I consume without the message > > > selectors, it works. I have it print the message headers and I can > > confirm > > > that they should match. > > > > > > This also seems to get worse over time. The larger the queue, the less > > > chance messages will be serviced, eventually it will just lock up > > entirely. > > > > > > > > > There are no obvious errors in the ActiveMQ log. Just regarding queue > > GC. > > > > > > The box still has about 40% memory free. So I don’t think it has any > > issue > > > with memory. No OutOfMemoryErrors being logged. > > > > > > I think another way to debug this could be to restart activemq itself > > with > > > message tracing. Then try to get the queue to this state again, and try > > to > > > consume messages nd see what’s being logged while it’s failing. > > > > > > What’s frustrating here is that this is the 3rd ActiveMQ workaround > I’ve > > > had to implement. > > > > > > the first was because LevelDB was very slow… (artificially slow it > > seems), > > > so then I decided to just use the memory store. But the memory store > > > doesn’t support priority, so instead, I implemented priority through > JMS > > > selectors. But now JMS selectors don’t work. > > > > > > :-/ > > > > > This sounds a lot like the standard issue of having a deep queue and the > > message selector not being able to match because the maxPageSize value > > is limiting what the message cursor will page in. Have you tried upping > > the maxPageSize option? See: > > https://issues.apache.org/jira/browse/AMQ-2217 > > > > -- > > Tim Bish > > Sr Software Engineer | RedHat Inc. > > tim.b...@redhat.com | www.redhat.com > > twitter: @tabish121 > > blog: http://timbish.blogspot.com/ > > > > > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> >