Re: Qpid broker 6.0.4 performance issues

Ramayan Tiwari Thu, 27 Oct 2016 15:20:18 -0700

Hi Rob,

I have the truck code which I am testing with, I haven't finished the test
runs yet. I was hoping that once I validate the change, I can simply
release 6.0.5.


Thanks
Ramayan

On Thu, Oct 27, 2016 at 12:41 PM, Rob Godfrey <[email protected]>
wrote:

> Hi Ramayan,
>
> did you verify that the change works for you?  You said you were going to
> test with the trunk code...
>
> I'll discuss with the other developers tomorrow about whether we can put
> this change into 6.0.5.
>
> Cheers,
> Rob
>
> On 27 October 2016 at 20:30, Ramayan Tiwari <[email protected]>
> wrote:
>
> > Hi Rob,
> >
> > I looked at the release notes for 6.0.5 and it doesn't include the fix
> for
> > large consumers issues [1]. The fix is marked for 6.1, which will not
> have
> > JMX and for us to use this version requires major changes in our
> monitoring
> > framework. Could you please include the fix in 6.0.5 release?
> >
> > Thanks
> > Ramayan
> >
> > [1]. https://issues.apache.org/jira/browse/QPID-7462
> >
> > On Wed, Oct 19, 2016 at 4:49 PM, Helen Kwong <[email protected]>
> wrote:
> >
> > > Hi Rob,
> > >
> > > Again, thank you so much for answering our questions and providing a
> > patch
> > > so quickly :) One more question I have: would it be possible to include
> > > test cases involving many queues and listeners (in the order of
> thousands
> > > of queues) for future Qpid releases, as part of standard perf testing
> of
> > > the broker?
> > >
> > > Thanks,
> > > Helen
> > >
> > > On Tue, Oct 18, 2016 at 10:40 AM, Ramayan Tiwari <
> > [email protected]
> > > > wrote:
> > >
> > >> Thanks so much Rob, I will test the patch against trunk and will
> update
> > >> you with the outcome.
> > >>
> > >> - Ramayan
> > >>
> > >> On Tue, Oct 18, 2016 at 2:37 AM, Rob Godfrey <[email protected]
> >
> > >> wrote:
> > >>
> > >>> On 17 October 2016 at 21:50, Rob Godfrey <[email protected]>
> > >>> wrote:
> > >>>
> > >>> >
> > >>> >
> > >>> > On 17 October 2016 at 21:24, Ramayan Tiwari <
> > [email protected]>
> > >>> > wrote:
> > >>> >
> > >>> >> Hi Rob,
> > >>> >>
> > >>> >> We are certainly interested in testing the "multi queue consumers"
> > >>> >> behavior
> > >>> >> with your patch in the new broker. We would like to know:
> > >>> >>
> > >>> >> 1. What will the scope of changes, client or broker or both? We
> are
> > >>> >> currently running 0.16 client, so would like to make sure that we
> > will
> > >>> >> able
> > >>> >> to use these changes with 0.16 client.
> > >>> >>
> > >>> >>
> > >>> > There's no change to the client.  I can't remember what was in the
> > 0.16
> > >>> > client... the only issue would be if there are any bugs in the
> > parsing
> > >>> of
> > >>> > address arguments.  I can try to test that out tmr.
> > >>> >
> > >>>
> > >>>
> > >>> OK - with a little bit of care to get round the address parsing
> issues
> > in
> > >>> the 0.16 client... I think we can get this to work.  I've created the
> > >>> following JIRA:
> > >>>
> > >>> https://issues.apache.org/jira/browse/QPID-7462
> > >>>
> > >>> and attached to it are a patch which applies against trunk, and a
> > >>> separate
> > >>> patch which applies against the 6.0.x branch (
> > >>> https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x - this is
> > >>> 6.0.4
> > >>> plus a few other fixes which we will soon be releasing as 6.0.5)
> > >>>
> > >>> To create a consumer which uses this feature (and multi queue
> > >>> consumption)
> > >>> for the 0.16 client you need to use something like the following as
> the
> > >>> address:
> > >>>
> > >>> queue_01 ; {node : { type : queue }, link : { x-subscribes : {
> > >>> arguments : { x-multiqueue : [ queue_01, queue_02, queue_03 ],
> > >>> x-pull-only : true }}}}
> > >>>
> > >>>
> > >>> Note that the initial queue_01 has to be a name of an actual queue on
> > >>> the virtual host, but otherwise it is not actually used (if you were
> > >>> using a 0.32 or later client you could just use '' here).  The actual
> > >>> queues that are consumed from are in the list value associated with
> > >>> x-multiqueue.  For my testing I created a list with 3000 queues here
> > >>> and this worked fine.
> > >>>
> > >>> Let me know if you have any questions / issues,
> > >>>
> > >>> Hope this helps,
> > >>> Rob
> > >>>
> > >>>
> > >>> >
> > >>> >
> > >>> >> 2. My understanding is that the "pull vs push" change is only with
> > >>> respect
> > >>> >> to broker and it does not change our architecture where we use
> > >>> >> MessageListerner to receive messages asynchronously.
> > >>> >>
> > >>> >
> > >>> > Exactly - this is only a change within the internal broker
> threading
> > >>> > model.  The external behaviour of the broker remains essentially
> > >>> unchanged.
> > >>> >
> > >>> >
> > >>> >>
> > >>> >> 3. Once I/O refactoring is completely, we would be able to go back
> > to
> > >>> use
> > >>> >> standard JMS consumer (Destination), what is the timeline and
> broker
> > >>> >> release version for the completion of this work?
> > >>> >>
> > >>> >
> > >>> > You might wish to continue to use the "multi queue" model,
> depending
> > on
> > >>> > your actual use case, but yeah once the I/O work is complete I
> would
> > >>> hope
> > >>> > that you could use the thousands of consumers model should you
> wish.
> > >>> We
> > >>> > don't have a schedule for the next phase of I/O rework right now -
> > >>> about
> > >>> > all I can say is that it is unlikely to be complete this year.  I'd
> > >>> need to
> > >>> > talk with Keith (who is currently on vacation) as to when we think
> we
> > >>> may
> > >>> > be able to schedule it.
> > >>> >
> > >>> >
> > >>> >>
> > >>> >> Let me know once you have integrated the patch and I will re-run
> our
> > >>> >> performance tests to validate it.
> > >>> >>
> > >>> >>
> > >>> > I'll make a patch for 6.0.x presently (I've been working on a
> change
> > >>> > against trunk - the patch will probably have to change a bit to
> apply
> > >>> to
> > >>> > 6.0.x).
> > >>> >
> > >>> > Cheers,
> > >>> > Rob
> > >>> >
> > >>> > Thanks
> > >>> >> Ramayan
> > >>> >>
> > >>> >> On Sun, Oct 16, 2016 at 3:30 PM, Rob Godfrey <
> > [email protected]
> > >>> >
> > >>> >> wrote:
> > >>> >>
> > >>> >> > OK - so having pondered / hacked around a bit this weekend, I
> > think
> > >>> to
> > >>> >> get
> > >>> >> > decent performance from the IO model in 6.0 for your use case
> > we're
> > >>> >> going
> > >>> >> > to have to change things around a bit.
> > >>> >> >
> > >>> >> > Basically 6.0 is an intermediate step on our IO / threading
> model
> > >>> >> journey.
> > >>> >> > In earlier versions we used 2 threads per connection for IO (one
> > >>> read,
> > >>> >> one
> > >>> >> > write) and then extra threads from a pool to "push" messages
> from
> > >>> >> queues to
> > >>> >> > connections.
> > >>> >> >
> > >>> >> > In 6.0 we move to using a pool for the IO threads, and also
> > stopped
> > >>> >> queues
> > >>> >> > from "pushing" to connections while the IO threads were acting
> on
> > >>> the
> > >>> >> > connection.  It's this latter fact which is screwing up
> > performance
> > >>> for
> > >>> >> > your use case here because what happens is that on each network
> > >>> read we
> > >>> >> > tell each consumer to stop accepting pushes from the queue until
> > >>> the IO
> > >>> >> > interaction has completed.  This is causing lots of loops over
> > your
> > >>> 3000
> > >>> >> > consumers on each session, which is eating up a lot of CPU on
> > every
> > >>> >> network
> > >>> >> > interaction.
> > >>> >> >
> > >>> >> > In the final version of our IO refactoring we want to remove the
> > >>> >> "pushing"
> > >>> >> > from the queue, and instead have the consumers "pull" - so that
> > the
> > >>> only
> > >>> >> > threads that operate on the queues (outside of housekeeping
> tasks
> > >>> like
> > >>> >> > expiry) will be the IO threads.
> > >>> >> >
> > >>> >> > So, what we could do (and I have a patch sitting on my laptop
> for
> > >>> this)
> > >>> >> is
> > >>> >> > to look at using the "multi queue consumers" work I did for you
> > guys
> > >>> >> > before, but augmenting this so that the consumers work using a
> > >>> "pull"
> > >>> >> model
> > >>> >> > rather than the push model.  This will guarantee strict fairness
> > >>> between
> > >>> >> > the queues associated with the consumer (which was the issue you
> > had
> > >>> >> with
> > >>> >> > this functionality before, I believe).  Using this model you'd
> > only
> > >>> >> need a
> > >>> >> > small number (one?) of consumers per session.  The patch I have
> is
> > >>> to
> > >>> >> add
> > >>> >> > this "pull" mode for these consumers (essentially this is a
> > preview
> > >>> of
> > >>> >> how
> > >>> >> > all consumers will work in the future).
> > >>> >> >
> > >>> >> > Does this seem like something you would be interested in
> pursuing?
> > >>> >> >
> > >>> >> > Cheers,
> > >>> >> > Rob
> > >>> >> >
> > >>> >> > On 15 October 2016 at 17:30, Ramayan Tiwari <
> > >>> [email protected]>
> > >>> >> > wrote:
> > >>> >> >
> > >>> >> > > Thanks Rob. Apologies for sending this over weekend :(
> > >>> >> > >
> > >>> >> > > Are there are docs on the new threading model? I found this on
> > >>> >> > confluence:
> > >>> >> > >
> > >>> >> > > https://cwiki.apache.org/confluence/display/qpid/IO+
> > >>> >> > Transport+Refactoring
> > >>> >> > >
> > >>> >> > > We are also interested in understanding the threading model a
> > >>> little
> > >>> >> > better
> > >>> >> > > to help us figure our its impact for our usage patterns. Would
> > be
> > >>> very
> > >>> >> > > helpful if there are more docs/JIRA/email-threads with some
> > >>> details.
> > >>> >> > >
> > >>> >> > > Thanks
> > >>> >> > >
> > >>> >> > > On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <
> > >>> [email protected]
> > >>> >> >
> > >>> >> > > wrote:
> > >>> >> > >
> > >>> >> > > > So I *think* this is an issue because of the extremely large
> > >>> number
> > >>> >> of
> > >>> >> > > > consumers.  The threading model in v6 means that whenever a
> > >>> network
> > >>> >> > read
> > >>> >> > > > occurs for a connection, it iterates over the consumers on
> > that
> > >>> >> > > connection
> > >>> >> > > > - obviously where there are a large number of consumers this
> > is
> > >>> >> > > > burdensome.  I fear addressing this may not be a trivial
> > >>> change...
> > >>> >> I
> > >>> >> > > shall
> > >>> >> > > > spend the rest of my afternoon pondering this...
> > >>> >> > > >
> > >>> >> > > > - Rob
> > >>> >> > > >
> > >>> >> > > > On 15 October 2016 at 17:14, Ramayan Tiwari <
> > >>> >> [email protected]>
> > >>> >> > > > wrote:
> > >>> >> > > >
> > >>> >> > > > > Hi Rob,
> > >>> >> > > > >
> > >>> >> > > > > Thanks so much for your response. We use transacted
> sessions
> > >>> with
> > >>> >> > > > > non-persistent delivery. Prefetch size is 1 and every
> > message
> > >>> is
> > >>> >> same
> > >>> >> > > > size
> > >>> >> > > > > (200 bytes).
> > >>> >> > > > >
> > >>> >> > > > > Thanks
> > >>> >> > > > > Ramayan
> > >>> >> > > > >
> > >>> >> > > > > On Sat, Oct 15, 2016 at 2:59 AM, Rob Godfrey <
> > >>> >> > [email protected]>
> > >>> >> > > > > wrote:
> > >>> >> > > > >
> > >>> >> > > > > > Hi Ramyan,
> > >>> >> > > > > >
> > >>> >> > > > > > this is interesting... in our testing (which admittedly
> > >>> didn't
> > >>> >> > cover
> > >>> >> > > > the
> > >>> >> > > > > > case of this many queues / listeners) we saw the 6.0.x
> > >>> broker
> > >>> >> using
> > >>> >> > > > less
> > >>> >> > > > > > CPU on average than the 0.32 broker.  I'll have a look
> > this
> > >>> >> weekend
> > >>> >> > > as
> > >>> >> > > > to
> > >>> >> > > > > > why creating the listeners is slower.  On the dequeing,
> > can
> > >>> you
> > >>> >> > give
> > >>> >> > > a
> > >>> >> > > > > > little more information on the usage pattern - are you
> > using
> > >>> >> > > > > transactions,
> > >>> >> > > > > > auto-ack or client ack?  What prefetch size are you
> using?
> > >>> How
> > >>> >> > large
> > >>> >> > > > are
> > >>> >> > > > > > your messages?
> > >>> >> > > > > >
> > >>> >> > > > > > Thanks,
> > >>> >> > > > > > Rob
> > >>> >> > > > > >
> > >>> >> > > > > > On 14 October 2016 at 23:46, Ramayan Tiwari <
> > >>> >> > > [email protected]>
> > >>> >> > > > > > wrote:
> > >>> >> > > > > >
> > >>> >> > > > > > > Hi All,
> > >>> >> > > > > > >
> > >>> >> > > > > > > We have been validating the new Qpid broker (version
> > >>> 6.0.4)
> > >>> >> and
> > >>> >> > > have
> > >>> >> > > > > > > compared against broker version 0.32 and are seeing
> > major
> > >>> >> > > > regressions.
> > >>> >> > > > > > > Following is the summary of our test setup and
> results:
> > >>> >> > > > > > >
> > >>> >> > > > > > > *1. Test Setup *
> > >>> >> > > > > > >   *a). *Qpid broker runs on a dedicated host (12
> cores,
> > >>> 32 GB
> > >>> >> > RAM).
> > >>> >> > > > > > >   *b).* For 0.32, we allocated 16 GB heap. For 6.0.6
> > >>> broker,
> > >>> >> we
> > >>> >> > use
> > >>> >> > > > 8GB
> > >>> >> > > > > > > heap and 8GB direct memory.
> > >>> >> > > > > > >   *c).* For 6.0.4, flow to disk has been configured at
> > >>> 60%.
> > >>> >> > > > > > >   *d).* Both the brokers use BDB host type.
> > >>> >> > > > > > >   *e).* Brokers have around 6000 queues and we create
> 16
> > >>> >> listener
> > >>> >> > > > > > > sessions/threads spread over 3 connections, where each
> > >>> >> session is
> > >>> >> > > > > > listening
> > >>> >> > > > > > > to 3000 queues. However, messages are only enqueued
> and
> > >>> >> processed
> > >>> >> > > > from
> > >>> >> > > > > 10
> > >>> >> > > > > > > queues.
> > >>> >> > > > > > >   *f).* We enqueue 1 million messages across 10
> > different
> > >>> >> queues
> > >>> >> > > > > (evenly
> > >>> >> > > > > > > divided), at the start of the test. Dequeue only
> starts
> > >>> once
> > >>> >> all
> > >>> >> > > the
> > >>> >> > > > > > > messages have been enqueued. We run the test for 2
> hours
> > >>> and
> > >>> >> > > process
> > >>> >> > > > as
> > >>> >> > > > > > > many messages as we can. Each message runs for around
> > 200
> > >>> >> > > > milliseconds.
> > >>> >> > > > > > >   *g).* We have used both 0.16 and 6.0.4 clients for
> > these
> > >>> >> tests
> > >>> >> > > > (6.0.4
> > >>> >> > > > > > > client only with 6.0.4 broker)
> > >>> >> > > > > > >
> > >>> >> > > > > > > *2. Test Results *
> > >>> >> > > > > > >   *a).* System Load Average (read notes below on how
> we
> > >>> >> compute
> > >>> >> > > it),
> > >>> >> > > > > for
> > >>> >> > > > > > > 6.0.4 broker is 5x compared to 0.32 broker. During
> start
> > >>> of
> > >>> >> the
> > >>> >> > > test
> > >>> >> > > > > > (when
> > >>> >> > > > > > > we are not doing any dequeue), load average is normal
> > >>> (0.05
> > >>> >> for
> > >>> >> > > 0.32
> > >>> >> > > > > > broker
> > >>> >> > > > > > > and 0.1 for new broker), however, while we are
> dequeuing
> > >>> >> > messages,
> > >>> >> > > > the
> > >>> >> > > > > > load
> > >>> >> > > > > > > average is very high (around 0.5 consistently).
> > >>> >> > > > > > >
> > >>> >> > > > > > >   *b). *Time to create listeners in new broker has
> gone
> > >>> up by
> > >>> >> > 220%
> > >>> >> > > > > > compared
> > >>> >> > > > > > > to 0.32 broker (when using 0.16 client). For old
> broker,
> > >>> >> creating
> > >>> >> > > 16
> > >>> >> > > > > > > sessions each listening to 3000 queues takes 142
> seconds
> > >>> and
> > >>> >> in
> > >>> >> > new
> > >>> >> > > > > > broker
> > >>> >> > > > > > > it took 456 seconds. If we use 6.0.4 client, it took
> > even
> > >>> >> longer
> > >>> >> > at
> > >>> >> > > > > 524%
> > >>> >> > > > > > > increase (887 seconds).
> > >>> >> > > > > > >      *I).* The time to create consumers increases as
> we
> > >>> create
> > >>> >> > more
> > >>> >> > > > > > > listeners on the same connections. We have 20 sessions
> > >>> (but
> > >>> >> end
> > >>> >> > up
> > >>> >> > > > > using
> > >>> >> > > > > > > around 5 of them) on each connection and we create
> about
> > >>> 3000
> > >>> >> > > > consumers
> > >>> >> > > > > > and
> > >>> >> > > > > > > attach MessageListener to it. Each successive session
> > >>> takes
> > >>> >> > longer
> > >>> >> > > > > > > (approximately linear increase) to setup same number
> of
> > >>> >> consumers
> > >>> >> > > and
> > >>> >> > > > > > > listeners.
> > >>> >> > > > > > >
> > >>> >> > > > > > > *3). How we compute System Load Average *
> > >>> >> > > > > > > We query the Mbean SysetmLoadAverage and divide it by
> > the
> > >>> >> value
> > >>> >> > of
> > >>> >> > > > > MBean
> > >>> >> > > > > > > AvailableProcessors. Both of these MBeans are
> available
> > >>> under
> > >>> >> > > > > > > java.lang.OperatingSystem.
> > >>> >> > > > > > >
> > >>> >> > > > > > > I am not sure what is causing these regressions and
> > would
> > >>> like
> > >>> >> > your
> > >>> >> > > > > help
> > >>> >> > > > > > in
> > >>> >> > > > > > > understanding it. We are aware about the changes with
> > >>> respect
> > >>> >> to
> > >>> >> > > > > > threading
> > >>> >> > > > > > > model in the new broker, are there any design docs
> that
> > >>> we can
> > >>> >> > > refer
> > >>> >> > > > to
> > >>> >> > > > > > > understand these changes at a high level? Can we tune
> > some
> > >>> >> > > parameters
> > >>> >> > > > > to
> > >>> >> > > > > > > address these issues?
> > >>> >> > > > > > >
> > >>> >> > > > > > > Thanks
> > >>> >> > > > > > > Ramayan
> > >>> >> > > > > > >
> > >>> >> > > > > >
> > >>> >> > > > >
> > >>> >> > > >
> > >>> >> > >
> > >>> >> >
> > >>> >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Qpid broker 6.0.4 performance issues

Reply via email to