Making producer side partitioning depend on consumer behavior might not be
such a good idea. If consumption is a bottleneck, changing producer side
partitioning may not help. To relieve consumption bottleneck, you may need
to increase the number of partitions for those topics and increase the
number of consumer instances.

You mentioned that the consumers take longer to process certain kinds of
messages. What you can do is place the messages that require slower
processing in separate topics, so that you can scale the number of
partitions and number of consumer instances, for those messages
independently.

Thanks,
Neha


On Sat, Aug 24, 2013 at 9:57 AM, Ian Friedman <i...@flurry.com> wrote:

> Hey guys! We recently deployed our kafka data pipeline application over
> the weekend and it is working out quite well once we ironed out all the
> issues. There is one behavior that we've noticed that is mildly troubling,
> though not a deal breaker. We're using a single topic with many partitions
> (1200 total) to load balance our 300 consumers, but what seems to happen is
> that some partitions end up more backed up than others. This is probably
> due more to the specifics of the application since some messages take much
> longer than others to process.
>
> I'm thinking that the random partitioning in the producer is unsuited to
> our specific needs. One option I was considering was to write an alternate
> partitioner that looks at the consumer offsets from zookeeper (as in the
> ConsumerOffsetChecker) and probabilistically weights the partitions by
> their lag. Does this sound like a good idea to anyone else? Is there a
> better or preferably already built solution? If anyone has any ideas or
> feedback I'd sincerely appreciate it.
>
> Thanks so much in advance.
>
> P.S. thanks especially to everyone who's answered my dumb questions on
> this mailing list over the past few months, we couldn't have done it
> without you!
>
> --
> Ian Friedman
>
>

Reply via email to