I don't think it would matter as long as you separate the types of message in different topics. Then just add more consumers to the ones that are slow. Am I missing something?
On Aug 25, 2013, at 8:59 AM, Ian Friedman <i...@flurry.com> wrote: > What if you don't know ahead of time how long a message will take to consume? > > -- > Ian Friedman > > > On Sunday, August 25, 2013 at 10:45 AM, Neha Narkhede wrote: > >> Making producer side partitioning depend on consumer behavior might not be >> such a good idea. If consumption is a bottleneck, changing producer side >> partitioning may not help. To relieve consumption bottleneck, you may need >> to increase the number of partitions for those topics and increase the >> number of consumer instances. >> >> You mentioned that the consumers take longer to process certain kinds of >> messages. What you can do is place the messages that require slower >> processing in separate topics, so that you can scale the number of >> partitions and number of consumer instances, for those messages >> independently. >> >> Thanks, >> Neha >> >> >> On Sat, Aug 24, 2013 at 9:57 AM, Ian Friedman <i...@flurry.com >> (mailto:i...@flurry.com)> wrote: >> >>> Hey guys! We recently deployed our kafka data pipeline application over >>> the weekend and it is working out quite well once we ironed out all the >>> issues. There is one behavior that we've noticed that is mildly troubling, >>> though not a deal breaker. We're using a single topic with many partitions >>> (1200 total) to load balance our 300 consumers, but what seems to happen is >>> that some partitions end up more backed up than others. This is probably >>> due more to the specifics of the application since some messages take much >>> longer than others to process. >>> >>> I'm thinking that the random partitioning in the producer is unsuited to >>> our specific needs. One option I was considering was to write an alternate >>> partitioner that looks at the consumer offsets from zookeeper (as in the >>> ConsumerOffsetChecker) and probabilistically weights the partitions by >>> their lag. Does this sound like a good idea to anyone else? Is there a >>> better or preferably already built solution? If anyone has any ideas or >>> feedback I'd sincerely appreciate it. >>> >>> Thanks so much in advance. >>> >>> P.S. thanks especially to everyone who's answered my dumb questions on >>> this mailing list over the past few months, we couldn't have done it >>> without you! >>> >>> -- >>> Ian Friedman >>> >> >> >> > >