On Mon, Jun 8, 2015 at 9:58 AM, Kevin Burton <bur...@spinn3r.com> wrote:
> > > > I can see two potential problems that your description didn't draw a line > > between: > > > > 1. With a large prefetch buffer, it's possible to have one thread > have a > > large number of prefetched tasks and another have none, even if all > > tasks > > take an average amount of time to complete. No thread is slow per se, > > but > > because the messages were prefetched lopsidedly, one thread sits idle > > while > > the other churns through what's on its plate. > > > > > Yes. Totally. I think I mentioned that but maybe didn’t spell it out > perfectly. > > This is one major edge case that needs to be addressed. > > > > 2. With *any* prefetch buffer size, it's possible to have one message > > that takes forever to complete. Any messages caught behind that one > > slow > > message are stuck until it finishes. > > > > > No.. that won’t happen because I have one consumer per thread so others are > dispatched on the other consumers. Even if prefetch is one. > > At least I have a test for this, and believe this to be the case and > verified that my test works properly. > > But this you *may* be right if we’re just explaining it differently. I use > one thread per consumer, so as long as there’s a message in prefetch, then > I’m good. > Prefetch buffers are per-consumer, not per-connection or per-thread or per-anything-else. A consumer that has messages other than the current one prefetched and takes a long time to process the current message will prevent any consumers from processing the messages it has prefetched. I'm not sure what your test is testing/showing, but I'm skeptical that it's showing that a slow consumer with > 1 message prefetched allows other consumers to process the messages it's not getting to. the problem is, I think, that my high CPU is stalling out ActiveMQ and so I > can’t stay prefetched. > > > > Which scenario are you worried about here? > > > > If the latter, the AbortSlowAckConsumerStrategy ( > > > > > http://timbish.blogspot.com/2013/07/coming-in-activemq-59-new-way-to-abort.html > > ; > > sadly the wiki doesn't detail this strategy and Tim's personal blog post > is > > the best documentation available) is intended to address exactly this: > > > > Oh yes.. I think I remember reading this. > > yes.. the duplicate processing is far from ideal. > > > > If the former, you're basically looking to enable work-stealing between > > consumers, and I'm not aware of any existing capability to do that. If > you > > wanted to implement it, you'd probably want to implement it as a sibling > > class to AbortSlowAckConsumerStrategy where SlowAck is the trigger but > > StealWork is the action rather than Abort. > > > > Yes.. but I think you detailed the reason why it’s not ideal - it requires > a lot of work! > Absolutely, plus changes to the OpenWire protocol (and the need to maintain backwards compatibility with OpenWire versions). Which is why the typical solution to this problem is to use a low prefetch buffer size (so you're only stalling out a few messages) and just live with the delay on those couple of messages. But a better solution would be awesome, if you wanted to work towards it. > I'm a little skeptical that your worker threads could so thoroughly smother > > the CPU that the thread doing the prefetching gets starved out > > > > you underestimate the power of the dark side… :) > > We’re very high CPU load… with something like 250-500 threads per daemon. > > We’re CPU oriented so if there’s work to be done, and we’re not at 100% > CPU, then we’re wasting compute resources. > Eeek. Are you sure you're not wasting more resources switching contexts than you're gaining by nominally keeping a thread on all cores at all times? (Some of that CPU time is being spent moving threads around rather than running them.) > > (particularly since I'd expect it to be primarily I/O-bound, so it's CPU > > usage should be minimal), though I guess if you had as many worker > threads > > as cores you might be able to burn through all the prefetched messages > > before the ActiveMQ thread gets rescheduled. But I assume that your > > workers are doing non-trivial amounts of work and are probably getting > > context switched repeatedly during their processing, which I'd think > would > > give the ActiveMQ thread plenty of time to do what it needs to. Unless > 1) > > you've set thread priorities to prioritize your workers over ActiveMQ, in > > which case don't do that, > > > > Yes. My thread should be minimum priority. I would verify that but > there’s a Linux bug that causes jstack to show the wrong value. > > Does anyone know what priority the ActiveMQ transport thread runs under? > The above bug prevents me from (easily) figuring that out. > I don't know what priority it runs as, but could you figure it out by attaching a debugger and setting a breakpoint and then examining the thread in the Watch/Expressions windows? You might even try raising that thread's priority (via the debugger, initially) and seeing if it makes an improvement. > > 2) your worker threads are somehow holding onto a > > lock that the ActiveMQ thread needs, which is possible but seems > unlikely, > > or 3) you've set up so many consumers (far more than you have cores) that > > the 1/(N+1)th that the ActiveMQ thread gets is too little or too > infrequent > > to maintain responsiveness, in which case you need to scale back your > > worker thread pool size (which I think means using fewer consumers per > > process, based on what you've described). > > > > Yes. This is the case. What I’m thinking to do is the reverse. To use > more connections and put then in a connection pull .. > > To right now if I have 1 connection, and 500 workers, then I have a 1:500 > ratio.. But if I bump that up to just 10… I’ll have 1:50. > > It think this is more realistic and would mean more CPU time to keep the > prefetch buffer warm. If I sized it as 1:1 (which would be wasteful of > resources I think) then I think the problem would effectively be solved > (but waste memory to solve it). > > But maybe around 1:10 or 1:20 it would be resolved. > > I need to verify that ActiveMQ allocates more thread per connection but I’m > pretty sure it does. > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> >