On 17 October 2016 at 21:50, Rob Godfrey <[email protected]> wrote:
> > > On 17 October 2016 at 21:24, Ramayan Tiwari <[email protected]> > wrote: > >> Hi Rob, >> >> We are certainly interested in testing the "multi queue consumers" >> behavior >> with your patch in the new broker. We would like to know: >> >> 1. What will the scope of changes, client or broker or both? We are >> currently running 0.16 client, so would like to make sure that we will >> able >> to use these changes with 0.16 client. >> >> > There's no change to the client. I can't remember what was in the 0.16 > client... the only issue would be if there are any bugs in the parsing of > address arguments. I can try to test that out tmr. > OK - with a little bit of care to get round the address parsing issues in the 0.16 client... I think we can get this to work. I've created the following JIRA: https://issues.apache.org/jira/browse/QPID-7462 and attached to it are a patch which applies against trunk, and a separate patch which applies against the 6.0.x branch ( https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x - this is 6.0.4 plus a few other fixes which we will soon be releasing as 6.0.5) To create a consumer which uses this feature (and multi queue consumption) for the 0.16 client you need to use something like the following as the address: queue_01 ; {node : { type : queue }, link : { x-subscribes : { arguments : { x-multiqueue : [ queue_01, queue_02, queue_03 ], x-pull-only : true }}}} Note that the initial queue_01 has to be a name of an actual queue on the virtual host, but otherwise it is not actually used (if you were using a 0.32 or later client you could just use '' here). The actual queues that are consumed from are in the list value associated with x-multiqueue. For my testing I created a list with 3000 queues here and this worked fine. Let me know if you have any questions / issues, Hope this helps, Rob > > >> 2. My understanding is that the "pull vs push" change is only with respect >> to broker and it does not change our architecture where we use >> MessageListerner to receive messages asynchronously. >> > > Exactly - this is only a change within the internal broker threading > model. The external behaviour of the broker remains essentially unchanged. > > >> >> 3. Once I/O refactoring is completely, we would be able to go back to use >> standard JMS consumer (Destination), what is the timeline and broker >> release version for the completion of this work? >> > > You might wish to continue to use the "multi queue" model, depending on > your actual use case, but yeah once the I/O work is complete I would hope > that you could use the thousands of consumers model should you wish. We > don't have a schedule for the next phase of I/O rework right now - about > all I can say is that it is unlikely to be complete this year. I'd need to > talk with Keith (who is currently on vacation) as to when we think we may > be able to schedule it. > > >> >> Let me know once you have integrated the patch and I will re-run our >> performance tests to validate it. >> >> > I'll make a patch for 6.0.x presently (I've been working on a change > against trunk - the patch will probably have to change a bit to apply to > 6.0.x). > > Cheers, > Rob > > Thanks >> Ramayan >> >> On Sun, Oct 16, 2016 at 3:30 PM, Rob Godfrey <[email protected]> >> wrote: >> >> > OK - so having pondered / hacked around a bit this weekend, I think to >> get >> > decent performance from the IO model in 6.0 for your use case we're >> going >> > to have to change things around a bit. >> > >> > Basically 6.0 is an intermediate step on our IO / threading model >> journey. >> > In earlier versions we used 2 threads per connection for IO (one read, >> one >> > write) and then extra threads from a pool to "push" messages from >> queues to >> > connections. >> > >> > In 6.0 we move to using a pool for the IO threads, and also stopped >> queues >> > from "pushing" to connections while the IO threads were acting on the >> > connection. It's this latter fact which is screwing up performance for >> > your use case here because what happens is that on each network read we >> > tell each consumer to stop accepting pushes from the queue until the IO >> > interaction has completed. This is causing lots of loops over your 3000 >> > consumers on each session, which is eating up a lot of CPU on every >> network >> > interaction. >> > >> > In the final version of our IO refactoring we want to remove the >> "pushing" >> > from the queue, and instead have the consumers "pull" - so that the only >> > threads that operate on the queues (outside of housekeeping tasks like >> > expiry) will be the IO threads. >> > >> > So, what we could do (and I have a patch sitting on my laptop for this) >> is >> > to look at using the "multi queue consumers" work I did for you guys >> > before, but augmenting this so that the consumers work using a "pull" >> model >> > rather than the push model. This will guarantee strict fairness between >> > the queues associated with the consumer (which was the issue you had >> with >> > this functionality before, I believe). Using this model you'd only >> need a >> > small number (one?) of consumers per session. The patch I have is to >> add >> > this "pull" mode for these consumers (essentially this is a preview of >> how >> > all consumers will work in the future). >> > >> > Does this seem like something you would be interested in pursuing? >> > >> > Cheers, >> > Rob >> > >> > On 15 October 2016 at 17:30, Ramayan Tiwari <[email protected]> >> > wrote: >> > >> > > Thanks Rob. Apologies for sending this over weekend :( >> > > >> > > Are there are docs on the new threading model? I found this on >> > confluence: >> > > >> > > https://cwiki.apache.org/confluence/display/qpid/IO+ >> > Transport+Refactoring >> > > >> > > We are also interested in understanding the threading model a little >> > better >> > > to help us figure our its impact for our usage patterns. Would be very >> > > helpful if there are more docs/JIRA/email-threads with some details. >> > > >> > > Thanks >> > > >> > > On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <[email protected] >> > >> > > wrote: >> > > >> > > > So I *think* this is an issue because of the extremely large number >> of >> > > > consumers. The threading model in v6 means that whenever a network >> > read >> > > > occurs for a connection, it iterates over the consumers on that >> > > connection >> > > > - obviously where there are a large number of consumers this is >> > > > burdensome. I fear addressing this may not be a trivial change... >> I >> > > shall >> > > > spend the rest of my afternoon pondering this... >> > > > >> > > > - Rob >> > > > >> > > > On 15 October 2016 at 17:14, Ramayan Tiwari < >> [email protected]> >> > > > wrote: >> > > > >> > > > > Hi Rob, >> > > > > >> > > > > Thanks so much for your response. We use transacted sessions with >> > > > > non-persistent delivery. Prefetch size is 1 and every message is >> same >> > > > size >> > > > > (200 bytes). >> > > > > >> > > > > Thanks >> > > > > Ramayan >> > > > > >> > > > > On Sat, Oct 15, 2016 at 2:59 AM, Rob Godfrey < >> > [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi Ramyan, >> > > > > > >> > > > > > this is interesting... in our testing (which admittedly didn't >> > cover >> > > > the >> > > > > > case of this many queues / listeners) we saw the 6.0.x broker >> using >> > > > less >> > > > > > CPU on average than the 0.32 broker. I'll have a look this >> weekend >> > > as >> > > > to >> > > > > > why creating the listeners is slower. On the dequeing, can you >> > give >> > > a >> > > > > > little more information on the usage pattern - are you using >> > > > > transactions, >> > > > > > auto-ack or client ack? What prefetch size are you using? How >> > large >> > > > are >> > > > > > your messages? >> > > > > > >> > > > > > Thanks, >> > > > > > Rob >> > > > > > >> > > > > > On 14 October 2016 at 23:46, Ramayan Tiwari < >> > > [email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > Hi All, >> > > > > > > >> > > > > > > We have been validating the new Qpid broker (version 6.0.4) >> and >> > > have >> > > > > > > compared against broker version 0.32 and are seeing major >> > > > regressions. >> > > > > > > Following is the summary of our test setup and results: >> > > > > > > >> > > > > > > *1. Test Setup * >> > > > > > > *a). *Qpid broker runs on a dedicated host (12 cores, 32 GB >> > RAM). >> > > > > > > *b).* For 0.32, we allocated 16 GB heap. For 6.0.6 broker, >> we >> > use >> > > > 8GB >> > > > > > > heap and 8GB direct memory. >> > > > > > > *c).* For 6.0.4, flow to disk has been configured at 60%. >> > > > > > > *d).* Both the brokers use BDB host type. >> > > > > > > *e).* Brokers have around 6000 queues and we create 16 >> listener >> > > > > > > sessions/threads spread over 3 connections, where each >> session is >> > > > > > listening >> > > > > > > to 3000 queues. However, messages are only enqueued and >> processed >> > > > from >> > > > > 10 >> > > > > > > queues. >> > > > > > > *f).* We enqueue 1 million messages across 10 different >> queues >> > > > > (evenly >> > > > > > > divided), at the start of the test. Dequeue only starts once >> all >> > > the >> > > > > > > messages have been enqueued. We run the test for 2 hours and >> > > process >> > > > as >> > > > > > > many messages as we can. Each message runs for around 200 >> > > > milliseconds. >> > > > > > > *g).* We have used both 0.16 and 6.0.4 clients for these >> tests >> > > > (6.0.4 >> > > > > > > client only with 6.0.4 broker) >> > > > > > > >> > > > > > > *2. Test Results * >> > > > > > > *a).* System Load Average (read notes below on how we >> compute >> > > it), >> > > > > for >> > > > > > > 6.0.4 broker is 5x compared to 0.32 broker. During start of >> the >> > > test >> > > > > > (when >> > > > > > > we are not doing any dequeue), load average is normal (0.05 >> for >> > > 0.32 >> > > > > > broker >> > > > > > > and 0.1 for new broker), however, while we are dequeuing >> > messages, >> > > > the >> > > > > > load >> > > > > > > average is very high (around 0.5 consistently). >> > > > > > > >> > > > > > > *b). *Time to create listeners in new broker has gone up by >> > 220% >> > > > > > compared >> > > > > > > to 0.32 broker (when using 0.16 client). For old broker, >> creating >> > > 16 >> > > > > > > sessions each listening to 3000 queues takes 142 seconds and >> in >> > new >> > > > > > broker >> > > > > > > it took 456 seconds. If we use 6.0.4 client, it took even >> longer >> > at >> > > > > 524% >> > > > > > > increase (887 seconds). >> > > > > > > *I).* The time to create consumers increases as we create >> > more >> > > > > > > listeners on the same connections. We have 20 sessions (but >> end >> > up >> > > > > using >> > > > > > > around 5 of them) on each connection and we create about 3000 >> > > > consumers >> > > > > > and >> > > > > > > attach MessageListener to it. Each successive session takes >> > longer >> > > > > > > (approximately linear increase) to setup same number of >> consumers >> > > and >> > > > > > > listeners. >> > > > > > > >> > > > > > > *3). How we compute System Load Average * >> > > > > > > We query the Mbean SysetmLoadAverage and divide it by the >> value >> > of >> > > > > MBean >> > > > > > > AvailableProcessors. Both of these MBeans are available under >> > > > > > > java.lang.OperatingSystem. >> > > > > > > >> > > > > > > I am not sure what is causing these regressions and would like >> > your >> > > > > help >> > > > > > in >> > > > > > > understanding it. We are aware about the changes with respect >> to >> > > > > > threading >> > > > > > > model in the new broker, are there any design docs that we can >> > > refer >> > > > to >> > > > > > > understand these changes at a high level? Can we tune some >> > > parameters >> > > > > to >> > > > > > > address these issues? >> > > > > > > >> > > > > > > Thanks >> > > > > > > Ramayan >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
