Re: Consumer throughput imbalance

2013-08-26 Thread Ian Friedman
Got it, thanks Jay -- Ian Friedman On Monday, August 26, 2013 at 2:37 PM, Jay Kreps wrote: > Yes exactly. > > Lowering queuedchunks.max shouldn't help if the problem is what I > described. That options controls how many chunks the consumer has ready in > memory for processing. But we are hyp

Re: Consumer throughput imbalance

2013-08-26 Thread Jay Kreps
Yes exactly. Lowering queuedchunks.max shouldn't help if the problem is what I described. That options controls how many chunks the consumer has ready in memory for processing. But we are hypothesisizing that your problem is actually that the individual chunks are just too large leading to the con

Re: Consumer throughput imbalance

2013-08-26 Thread Ian Friedman
On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote: > I'm still a little confused by your description of the problem. It might be > easier to understand if you listed out the exact things you have measured, > what you saw, and what you expected to see. The problem is that some consumers are sl

Re: Consumer throughput imbalance

2013-08-26 Thread Ian Friedman
Just to make sure i have this right, on the producer side we'd set max.message.size and then on the consumer side we'd set fetch.size? I admittedly didn't research how all the tuning options would affect us, thank you for the info. Would queuedchunks.max have any effect? -- Ian Friedman On M

Re: Consumer throughput imbalance

2013-08-26 Thread Jay Kreps
Yeah it is always equal to the fetch size. The fetch size needs to be at least equal to the max message size you have allowed on the server, though. -Jay On Sun, Aug 25, 2013 at 10:00 PM, Ian Friedman wrote: > Jay - is there any way to control the size of the interleaved chunks? The > performa

Re: Consumer throughput imbalance

2013-08-25 Thread Ian Friedman
Jay - is there any way to control the size of the interleaved chunks? The performance hit would likely be negligible for us at the moment. -- Ian Friedman On Sunday, August 25, 2013 at 3:11 PM, Jay Kreps wrote: > I'm still a little confused by your description of the problem. It might be >

Re: Consumer throughput imbalance

2013-08-25 Thread Jay Kreps
I'm still a little confused by your description of the problem. It might be easier to understand if you listed out the exact things you have measured, what you saw, and what you expected to see. Since you mentioned the consumer I can give a little info on how that works. The consumer consumes from

Re: Consumer throughput imbalance

2013-08-25 Thread Ian Friedman
Sorry I reread what I've written so far and found that it doesn't state the actual problem very well. Let me clarify once again: The problem we're trying to solve is that we can't let messages go for unbounded amounts of time without getting processed, and it seems that something about what we

Re: Consumer throughput imbalance

2013-08-25 Thread Ian Friedman
When I said "some messages take longer than others" that may have been misleading. What I meant there is that the performance of the entire application is inconsistent, mostly due to pressure from other applications (mapreduce) on our HBase and MySQL backends. On top of that, some messages just

Re: Consumer throughput imbalance

2013-08-25 Thread Mark
I don't think it would matter as long as you separate the types of message in different topics. Then just add more consumers to the ones that are slow. Am I missing something? On Aug 25, 2013, at 8:59 AM, Ian Friedman wrote: > What if you don't know ahead of time how long a message will take t

Re: Consumer throughput imbalance

2013-08-25 Thread Ian Friedman
What if you don't know ahead of time how long a message will take to consume? -- Ian Friedman On Sunday, August 25, 2013 at 10:45 AM, Neha Narkhede wrote: > Making producer side partitioning depend on consumer behavior might not be > such a good idea. If consumption is a bottleneck, changing

Re: Consumer throughput imbalance

2013-08-25 Thread Neha Narkhede
Making producer side partitioning depend on consumer behavior might not be such a good idea. If consumption is a bottleneck, changing producer side partitioning may not help. To relieve consumption bottleneck, you may need to increase the number of partitions for those topics and increase the numbe