Hi Robbie, Thanks for writing back soon. Please see inline.
On Mon, Jul 16, 2012 at 3:32 PM, Robbie Gemmell <robbie.gemm...@gmail.com>wrote: > Ok, so to check I understand correctly, and seek clarification on some > points... > > You have potentially 30 application instances that have 5 connections, 20 > sessions per connection, and are each creating 2 consumers on all 6000 > priority queues (using 600 consumers per session), thus giving up to 150 > (30x5) connections, 3000 (30x5x20) sessions, and 360000 (30x2x6000) > consumers? > > yes, that is correct. > The consumers would only require 600 (360000/600) sessions, so can I assume > the other 2400 sessions would be used for publishers, or have I > misinterpreted something? (I am unclear on the '20-30' vs '15') > > Yes. You are correct again. However, i forgot to tell you that we have dedicated connections for consumers(2 connections) vs publishers(5 connections). Thus it'd be 600 sessions for consumers and 3000 sessions for publishers. > How are the sessions for the consumers spread across the connections: all > on 1 connection, 4 on each of the 5 connections, something else? > I have 2 connections dedicated to consumers (publishers won't use these connections. I try to isolate publisher from consumer connections.). The 5 connections i mentioned above are used only by publishers. (sorry for being not very clear earlier). Since we have 2 connections for consumers, it's 10 consumer sessions/connection/server > Although you are ultimately looking to increase performance by batching, it > is actually more the application processing steps you are looking to speed > up by supplying more data at once rather than explicitly decreasing the > actual messaging overhead (which if bounding performance due to round trips > to the broker, can mean larger batches increasing message throughput). > > Yes that is correct. > Although you would like processing across the queues to be fair, you dont > actually have any explicit ordering requirements such as 'after processing > messages from Queue X we must process Queue Foo'. > > Yes. There is no such ordering requirements. > If each queue currently has up to 60 (30x2) consumers competing for the > messages, does this mean you have no real ordering requirements > (discounting priorities) when processing the messages on each queue, i.e it > doesn't matter which application instances get a particular message, and > say particular consumers could get and process the first and third messages > whilst a slower consumer actually got and then later finished processing > the second message? I ask because if you try to batch the messages on > queues with multiple consumers and no prefetch (or even with prefetch) it > isn't likely you would find consumers getting a sequential batch-sized > group of messages (without introducing message grouping to the mix, that > is) but rather instead get a message followed by other messages with one or > more intermediate 'gaps' where competing consumers received those messages. > Is that acceptable to whatever batched processing it is you are likely to > be doing? > > yes. we do not have any ordering requirement. Yes we're ok with exactly what you describe. Each message is independent of the other, and we do not process messages in a workflow order anyway. We do not use any message grouping (and do not plan to), and gaps are ok. > You mentioned possibly only 100 queues servicing batch messages. Did you > mean that you could know/decide in advance which those queues are, i.e they > are readily identifiable in advance, or could it just be any 100 queues > based on some condition at a given point in time? > > Yes. we could decide in advance and identify batch queues if required. Thanks Robbie. > Robbie > > On 16 July 2012 16:54, Praveen M <lefthandma...@gmail.com> wrote: > > > Hi Robbie. Thank you for writing back. Please see inline for answers to > > some of the questions you had. > > > > On Mon, Jul 16, 2012 at 4:40 AM, Robbie Gemmell < > robbie.gemm...@gmail.com > > >wrote: > > > > > Hi Praveen, > > > > > > I have talked this over with some of the others here, and tend to agree > > > with Gordon and Rajith that mixing asynchronous and synchronous > consumers > > > in that fashion isn't a route I would really suggest; using two > sessions > > > makes for complication around transactionality and ordering, and I dont > > > think it will work on a single session. > > > > > > We do have some ideas you could potentially use to implement batching > in > > > the application to improve performance, but there are various > subtleties > > to > > > consider that might heavily influence our suggestions. As such we > really > > > need a good bit more detail around the use case to actually give a > > reasoned > > > answer. For example: > > > > > > - How many connections/sessions/consumers/queues are actually in use? > > > > > > > In our current system, we have 20-30 client servers talking to our Qpid > > messaging server. > > We have 5 connections, 20 sessions/connection, 2 consumers/queue from a > > single client server's standpoint.(so all the numbers should be > multiplied > > by a max factor of 30, since we could have upto 30 client servers). > > We create overall 6000 queues in our Qpid messaging server. > > > > > > > - Are there multiple consumers on each/any of the queues at the same > > time? > > > > > Yes. To explain this a little bit, > > > > We have about 15 client servers, consuming messages. > > we have 20 sessions(threads) consuming messages per client server. We > have > > broken the 6000 queues into 10 buckets, and have 2 sessions(threads) > > listening/consuming on every 600 queues. Hence, an individual session > might > > try to listen and consume from 600 queues max on the same thread. > > > > > > - What if any ordering requirements are there on the message processing > > > (either within each queue or across all the queues)? > > > > > Across all queues, we'd like to process in a round-robin fashion to > ensure > > fairness across the queues. We achieve this now by turning off prefecting > > (we're using prefetch 1, which works well). > > Within the queue, all our queues are priority queues, so we process based > > upon priority order. > > > > > > > - What is the typical variation of message volumes across the queues > that > > > you are looking to balance? > > > > volumes vary quite a bit between queues(based upon the service the queue > is > > tied to). Some queues, have relatively low traffic, some have bursty, and > > some have consistent high, and some with > > slow consumers. > > Our numbers are at a high of a million per day for a busy queue. > > > > > > > - What are the typical message sizes? > > > > > Message sizes are typically arond 1KB-2KB > > > > > > > - How many messages might you potentially be looking to batch? > > > > > The batch sizes are typically provided from our client applications, and > > typically it's in the order of 10-50 > > > > > > > - What is the typical processing time in onMessage() now? Would this > vary > > > as a direct multipe of the number of messages batched, or by some other > > > scaling? > > > > > > The onMessage() callback invokes an application service, so I can't say > > exactly...but with the effect of batching the processing time is > typically > > quite less than the direct multiple of the number of messages batched. > > > > Most typical use case for us, where messages are batched helps is, when a > > database query is invoked with the batched messages thus performing a > bulk > > operation. This can be very expensive for us, if we do this in a > one-by-one > > order instead of batching the database query. > > Also, typically batch message traffic is bursty, and our processing times > > are quite high. From our current data, even though we have a multiple > > consumer setup, batching helps us process efficiently for applications > > which process messages in bulk. > > > > Also, out of all our queues. I would say, only about a 100 of them would > be > > servicing batch messages. > > > > Our current messaging infrastructure supports batch messages, and hence > we > > have a lot of dependent code written which expects batching. Getting out > of > > it now, might be quite tough at this point, hence I'd like to implement a > > pseudo batch on top of Qpid. My original thought was around using 2 > > sessions, onMessage() and a synchronous consumer. I don't think we have > > much concern with transactionality as we have our own reference to each > > message in our database to guarantee transactionality. > > > > Do let me know what you think, and I'd love to hear if you can think of > > alternate approaches to this problem. > > > > Hope to hear from you soon. > > > > Thanks, > > Praveen > > > > Regards, > > > Robbie > > > > > > On 12 July 2012 17:53, Praveen M <lefthandma...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I'm trying to explore if there are ways to batch message processing. > > > > Batching message processing would help us improve performance for > some > > of > > > > our use cases, > > > > where we could chunk messages and process them in a single callback. > > > > > > > > Have anyone here explored building a layer to batch messages. > > > > > > > > I am using the Java Broker and the Java client. > > > > > > > > I would like to stick to the JMS api as much as possible. > > > > > > > > This is what I currently have, still wondering if it'd work. > > > > > > > > 1) When the onMessage() callback is triggered, create a consumer a > pull > > > > more messages to process from the queue where the message was > delivered > > > > from. > > > > 2) Pull messages upto the number of my max chunk size, or upto the > > > messages > > > > available in the queue. > > > > 3) process all the messages together and commit on the session. > > > > > > > > I'd like to hear ideas on how to go about this. > > > > > > > > Thanks, > > > > -- > > > > -Praveen > > > > > > > > > > > > > > > -- > > -Praveen > > > -- -Praveen