>
> I can see two potential problems that your description didn't draw a line
> between:
>
>    1. With a large prefetch buffer, it's possible to have one thread have a
>    large number of prefetched tasks and another have none, even if all
> tasks
>    take an average amount of time to complete.  No thread is slow per se,
> but
>    because the messages were prefetched lopsidedly, one thread sits idle
> while
>    the other churns through what's on its plate.
>


Yes.  Totally.  I think I mentioned that but maybe didn’t spell it out
perfectly.

This is one major edge case that needs to be addressed.


>    2. With *any* prefetch buffer size, it's possible to have one message
>    that takes forever to complete.  Any messages caught behind that one
> slow
>    message are stuck until it finishes.
>
>
No.. that won’t happen because I have one consumer per thread so others are
dispatched on the other consumers.  Even if prefetch is one.

At least I have a test for this, and believe this to be the case and
verified that my test works properly.

But this you *may* be right if we’re just explaining it differently.  I use
one thread per consumer, so as long as there’s a message in prefetch, then
I’m good.

the problem is, I think, that my high CPU is stalling out ActiveMQ and so I
can’t stay prefetched.


>  Which scenario are you worried about here?
>
> If the latter, the AbortSlowAckConsumerStrategy (
>
> http://timbish.blogspot.com/2013/07/coming-in-activemq-59-new-way-to-abort.html
> ;
> sadly the wiki doesn't detail this strategy and Tim's personal blog post is
> the best documentation available) is intended to address exactly this:
>

Oh yes.. I think I remember reading this.

yes.. the duplicate processing is far from ideal.


> If the former, you're basically looking to enable work-stealing between
> consumers, and I'm not aware of any existing capability to do that.  If you
> wanted to implement it, you'd probably want to implement it as a sibling
> class to AbortSlowAckConsumerStrategy where SlowAck is the trigger but
> StealWork is the action rather than Abort.



Yes.. but I think you detailed the reason why it’s not ideal - it requires
a lot of work!


I'm a little skeptical that your worker threads could so thoroughly smother
> the CPU that the thread doing the prefetching gets starved out
>

you underestimate the power of the dark side… :)

We’re very high CPU load… with something like 250-500 threads per daemon.

We’re CPU oriented so if there’s work to be done, and we’re not at 100%
CPU, then we’re wasting compute resources.


> (particularly since I'd expect it to be primarily I/O-bound, so it's CPU
> usage should be minimal), though I guess if you had as many worker threads
> as cores you might be able to burn through all the prefetched messages
> before the ActiveMQ thread gets rescheduled.  But I assume that your
> workers are doing non-trivial amounts of work and are probably getting
> context switched repeatedly during their processing, which I'd think would
> give the ActiveMQ thread plenty of time to do what it needs to.  Unless 1)
> you've set thread priorities to prioritize your workers over ActiveMQ, in
> which case don't do that,



Yes.  My thread should be minimum priority.  I would verify that but
there’s a Linux bug that causes jstack to show the wrong value.

Does anyone know what priority the ActiveMQ transport thread runs under?
The above bug prevents me from (easily) figuring that out.


> 2) your worker threads are somehow holding onto a
> lock that the ActiveMQ thread needs, which is possible but seems unlikely,
> or 3) you've set up so many consumers (far more than you have cores) that
> the 1/(N+1)th that the ActiveMQ thread gets is too little or too infrequent
> to maintain responsiveness, in which case you need to scale back your
> worker thread pool size (which I think means using fewer consumers per
> process, based on what you've described).
>

Yes.  This is the case.  What I’m thinking to do is the reverse.  To use
more connections and put then in a connection pull ..

To right now if I have 1 connection, and 500 workers, then I have a 1:500
ratio..  But if I bump that up to just 10… I’ll have 1:50.

It think this is more realistic and would mean more CPU time to keep the
prefetch buffer warm.  If I sized it as 1:1 (which would be wasteful of
resources I think) then I think the problem would effectively be solved
(but waste memory to solve it).

But maybe around 1:10 or 1:20 it would be resolved.

I need to verify that ActiveMQ allocates more thread per connection but I’m
pretty sure it does.


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to