Hi Praveen,

So, as it turns out, after talking over the specifics of your use case
further it doesn't seem like any of the things we considered will work for
you, so we don't really have anything better left to suggest than the
second synchronous consumer you proposed. Although we don't especially like
it, your use case does at least seem to be one that shouldn't fall foul of
some of the inherant limitatations of doing that.

(In case you are interested, the most promising idea was one Rob had
suggested involving doing some things with queue bindings and an LVQ to
implement a kind of control queue which could be used implement triggering
of batched synchronous consumption on the original payload queues.
Unfortuantely, this wont really work with the multiple consumers you have
in place since they wont necessarily want to consume all of the messages on
a given queue at once for fairness and it would then become necessary to
somehow signal further processing was required by potentially another
consumer. Equally, removing the conflation on the control queue to
compensate for the multiple consumers would just lead to a situation where
you would invariably end up triggering activity against a queue that one or
more other consumers had already drained and so this wouldn't be
particularly efficient.)

As an aside, we were quite impressed by the number of consumers you are
using, its just a smidge (up to 2 orders of magnitude) more than most of
our users typically have :)

Robbie

On 17 July 2012 15:05, Praveen M <lefthandma...@gmail.com> wrote:

> Hi Robbie,
>
> Thanks for writing back soon. Please see inline.
>
> On Mon, Jul 16, 2012 at 3:32 PM, Robbie Gemmell <robbie.gemm...@gmail.com
> >wrote:
>
> > Ok, so to check I understand correctly, and seek clarification on some
> > points...
> >
> > You have potentially 30 application instances that have 5 connections, 20
> > sessions per connection, and are each creating 2 consumers on all 6000
> > priority queues (using 600 consumers per session), thus giving up to 150
> > (30x5) connections, 3000 (30x5x20) sessions, and 360000 (30x2x6000)
> > consumers?
> >
> > yes, that is correct.
>
>
> > The consumers would only require 600 (360000/600) sessions, so can I
> assume
> > the other 2400 sessions would be used for publishers, or have I
> > misinterpreted something? (I am unclear on the '20-30' vs '15')
> >
> > Yes. You are correct again. However, i forgot to tell you that we have
> dedicated connections for consumers(2 connections) vs publishers(5
> connections). Thus it'd be 600 sessions for consumers and 3000 sessions for
> publishers.
>
>
> > How are the sessions for the consumers spread across the connections: all
> > on 1 connection, 4 on each of the 5 connections, something else?
> >
>
> I have 2 connections dedicated to consumers (publishers won't use these
> connections. I try to isolate publisher from consumer connections.). The 5
> connections i mentioned above are used only by publishers. (sorry for being
> not very clear earlier).
>
> Since we have 2 connections for consumers, it's 10 consumer
> sessions/connection/server
>
>
> > Although you are ultimately looking to increase performance by batching,
> it
> > is actually more the application processing steps you are looking to
> speed
> > up by supplying more data at once rather than explicitly decreasing the
> > actual messaging overhead (which if bounding performance due to round
> trips
> > to the broker, can mean larger batches increasing message throughput).
> >
> > Yes that is correct.
>
>
> > Although you would like processing across the queues to be fair, you dont
> > actually have any explicit ordering requirements such as 'after
> processing
> > messages from Queue X we must process Queue Foo'.
> >
> > Yes. There is no such ordering requirements.
>
>
> > If each queue currently has up to 60 (30x2) consumers competing for the
> > messages, does this mean you have no real ordering requirements
> > (discounting priorities) when processing the messages on each queue, i.e
> it
> > doesn't matter which application instances get a particular message, and
> > say particular consumers could get and process the first and third
> messages
> > whilst a slower consumer actually got and then later finished processing
> > the second message? I ask because if you try to batch the messages on
> > queues with multiple consumers and no prefetch (or even with prefetch) it
> > isn't likely you would find consumers getting a sequential batch-sized
> > group of messages (without introducing message grouping to the mix, that
> > is) but rather instead get a message followed by other messages with one
> or
> > more intermediate 'gaps' where competing consumers received those
> messages.
> > Is that acceptable to whatever batched processing it is you are likely to
> > be doing?
> >
> > yes. we do not have any ordering requirement. Yes we're ok with exactly
> what you describe. Each message is independent of the other, and we do not
> process messages in a workflow order anyway. We do not use any message
> grouping (and do not plan to), and gaps are ok.
>
>
> > You mentioned possibly only 100 queues servicing batch messages. Did you
> > mean that you could know/decide in advance which those queues are, i.e
> they
> > are readily identifiable in advance, or could it just be any 100 queues
> > based on some condition at a given point in time?
> >
> > Yes. we could decide in advance and identify batch queues if required.
>
> Thanks Robbie.
>
>
> > Robbie
> >
> > On 16 July 2012 16:54, Praveen M <lefthandma...@gmail.com> wrote:
> >
> > > Hi Robbie. Thank you for writing back. Please see inline for answers to
> > > some of the questions you had.
> > >
> > > On Mon, Jul 16, 2012 at 4:40 AM, Robbie Gemmell <
> > robbie.gemm...@gmail.com
> > > >wrote:
> > >
> > > > Hi Praveen,
> > > >
> > > > I have talked this over with some of the others here, and tend to
> agree
> > > > with Gordon and Rajith that mixing asynchronous and synchronous
> > consumers
> > > > in that fashion isn't a route I would really suggest; using two
> > sessions
> > > > makes for complication around transactionality and ordering, and I
> dont
> > > > think it will work on a single session.
> > > >
> > > > We do have some ideas you could potentially use to implement batching
> > in
> > > > the application to improve performance, but there are various
> > subtleties
> > > to
> > > > consider that might heavily influence our suggestions. As such we
> > really
> > > > need a good bit more detail around the use case to actually give a
> > > reasoned
> > > > answer. For example:
> > > >
> > > > - How many connections/sessions/consumers/queues are actually in use?
> > > >
> > >
> > > In our current system, we have 20-30 client servers talking to our Qpid
> > > messaging server.
> > > We have 5 connections, 20 sessions/connection, 2 consumers/queue from a
> > > single client server's standpoint.(so all the numbers should be
> > multiplied
> > > by a max factor of 30, since we could have upto 30 client servers).
> > > We create overall 6000 queues in our Qpid messaging server.
> > >
> > >
> > > > - Are there multiple consumers on each/any of the queues at the same
> > > time?
> > > >
> > > Yes. To explain this a little bit,
> > >
> > > We have about 15 client servers, consuming messages.
> > > we have 20 sessions(threads) consuming messages per client server. We
> > have
> > > broken the 6000 queues into 10 buckets, and have 2 sessions(threads)
> > > listening/consuming on every 600 queues. Hence, an individual session
> > might
> > > try to listen and consume from 600 queues max on the same thread.
> > >
> > >
> > > - What if any ordering requirements are there on the message processing
> > > > (either within each queue or across all the queues)?
> > > >
> > > Across all queues, we'd like to process in a round-robin fashion to
> > ensure
> > > fairness across the queues. We achieve this now by turning off
> prefecting
> > > (we're using prefetch 1, which works well).
> > > Within the queue, all our queues are priority queues, so we process
> based
> > > upon priority order.
> > >
> > >
> > > > - What is the typical variation of message volumes across the queues
> > that
> > > > you are looking to balance?
> > >
> > > volumes vary quite a bit between queues(based upon the service the
> queue
> > is
> > > tied to). Some queues, have relatively low traffic, some have bursty,
> and
> > > some have consistent high, and some with
> > > slow consumers.
> > > Our numbers are at a high of a million per day for a busy queue.
> > >
> > >
> > > > - What are the typical message sizes?
> > > >
> > > Message sizes are typically arond 1KB-2KB
> > >
> > >
> > > > - How many messages might you potentially be looking to batch?
> > > >
> > > The batch sizes are typically provided from our client applications,
> and
> > > typically it's in the order of 10-50
> > >
> > >
> > > > - What is the typical processing time in onMessage() now? Would this
> > vary
> > > > as a direct multipe of the number of messages batched, or by some
> other
> > > > scaling?
> > >
> > >
> > > The onMessage() callback invokes an application service, so I can't say
> > > exactly...but with the effect of batching the processing time is
> > typically
> > > quite less than the direct multiple of the number of messages batched.
> > >
> > > Most typical use case for us, where messages are batched helps is,
> when a
> > > database query is invoked with the batched messages thus performing a
> > bulk
> > > operation. This can be very expensive for us, if we do this in a
> > one-by-one
> > > order instead of batching the database query.
> > > Also, typically batch message traffic is bursty, and our processing
> times
> > > are quite high. From our current data, even though we have a multiple
> > > consumer setup, batching helps us process efficiently for applications
> > > which process messages in bulk.
> > >
> > > Also, out of all our queues. I would say, only about a 100 of them
> would
> > be
> > > servicing batch messages.
> > >
> > > Our current messaging infrastructure supports batch messages, and hence
> > we
> > > have a lot of dependent code written which expects batching. Getting
> out
> > of
> > > it now, might be quite tough at this point, hence I'd like to
> implement a
> > > pseudo batch on top of Qpid. My original thought was around using 2
> > > sessions, onMessage() and a synchronous consumer. I don't think we have
> > > much concern with transactionality as we have our own reference to each
> > > message in our database to guarantee transactionality.
> > >
> > > Do let me know what you think, and I'd love to hear if you can think of
> > > alternate approaches to this problem.
> > >
> > > Hope to hear from you soon.
> > >
> > > Thanks,
> > > Praveen
> > >
> > > Regards,
> > > > Robbie
> > > >
> > > > On 12 July 2012 17:53, Praveen M <lefthandma...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to explore if there are ways to batch message
> processing.
> > > > > Batching message processing would help us improve performance for
> > some
> > > of
> > > > > our use cases,
> > > > > where we could chunk messages and process them in a single
> callback.
> > > > >
> > > > > Have anyone here explored building a layer to batch messages.
> > > > >
> > > > > I am using the Java Broker and the Java client.
> > > > >
> > > > > I would like to stick to the JMS api as much as possible.
> > > > >
> > > > > This is what I currently have, still wondering if it'd work.
> > > > >
> > > > > 1) When the onMessage() callback is triggered, create a consumer a
> > pull
> > > > > more messages to process from the queue where the message was
> > delivered
> > > > > from.
> > > > > 2) Pull messages upto the number of my max chunk size, or upto the
> > > > messages
> > > > > available in the queue.
> > > > > 3) process all the messages together and commit on the session.
> > > > >
> > > > > I'd like to hear ideas on how to go about this.
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > -Praveen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Praveen
> > >
> >
>
>
>
> --
> -Praveen
>

Reply via email to