Re: Consumer throughput imbalance

Mark Sun, 25 Aug 2013 09:14:47 -0700

I don't think it would matter as long as you separate the types of message in 
different topics. Then just add more consumers to the ones that are slow. Am I 
missing something?


On Aug 25, 2013, at 8:59 AM, Ian Friedman <i...@flurry.com> wrote:

> What if you don't know ahead of time how long a message will take to consume? 
> 
> -- 
> Ian Friedman
> 
> 
> On Sunday, August 25, 2013 at 10:45 AM, Neha Narkhede wrote:
> 
>> Making producer side partitioning depend on consumer behavior might not be
>> such a good idea. If consumption is a bottleneck, changing producer side
>> partitioning may not help. To relieve consumption bottleneck, you may need
>> to increase the number of partitions for those topics and increase the
>> number of consumer instances.
>> 
>> You mentioned that the consumers take longer to process certain kinds of
>> messages. What you can do is place the messages that require slower
>> processing in separate topics, so that you can scale the number of
>> partitions and number of consumer instances, for those messages
>> independently.
>> 
>> Thanks,
>> Neha
>> 
>> 
>> On Sat, Aug 24, 2013 at 9:57 AM, Ian Friedman <i...@flurry.com 
>> (mailto:i...@flurry.com)> wrote:
>> 
>>> Hey guys! We recently deployed our kafka data pipeline application over
>>> the weekend and it is working out quite well once we ironed out all the
>>> issues. There is one behavior that we've noticed that is mildly troubling,
>>> though not a deal breaker. We're using a single topic with many partitions
>>> (1200 total) to load balance our 300 consumers, but what seems to happen is
>>> that some partitions end up more backed up than others. This is probably
>>> due more to the specifics of the application since some messages take much
>>> longer than others to process.
>>> 
>>> I'm thinking that the random partitioning in the producer is unsuited to
>>> our specific needs. One option I was considering was to write an alternate
>>> partitioner that looks at the consumer offsets from zookeeper (as in the
>>> ConsumerOffsetChecker) and probabilistically weights the partitions by
>>> their lag. Does this sound like a good idea to anyone else? Is there a
>>> better or preferably already built solution? If anyone has any ideas or
>>> feedback I'd sincerely appreciate it.
>>> 
>>> Thanks so much in advance.
>>> 
>>> P.S. thanks especially to everyone who's answered my dumb questions on
>>> this mailing list over the past few months, we couldn't have done it
>>> without you!
>>> 
>>> --
>>> Ian Friedman
>>> 
>> 
>> 
>> 
> 
>

Re: Consumer throughput imbalance

Reply via email to