You can take a look at the "consumer rebalancing algorithm" part in http://kafka.apache.org/documentation.html. Basically, partitions are evenly distributed among all consumers in the same group. If there are more consumers in a group than partitions, some consumers will never get any data.
Thanks, Jun On Mon, Oct 27, 2014 at 4:14 AM, Shlomi Hazan <shl...@viber.com> wrote: > Hi All, > > Using Kafka's high consumer API I have bumped into a situation where > launching a consumer process P1 with X consuming threads on a topic with X > partition kicks out all other existing consumer threads that consumed prior > to launching the process P. > That is, consumer process P is stealing all partitions from all other > consumer processes. > > While understandable, it makes it hard to size & deploy a cluster with a > number of partitions that will both allow balancing of consumption across > consuming processes, dividing the partitions across consumers by setting > each consumer with it's share of the total number of partitions on the > consumed topic, and on the other hand provide room for growth and addition > of new consumers to help with increasing traffic into the cluster and the > topic. > > This stealing effect forces me to have more partitions then really needed > at the moment, planning for future growth, or stick to what I need and > trust the option to add partitions which comes with a price in terms of > restarting consumers, bumping into out of order messages (hash > partitioning) etc. > > Is this policy of stealing is intended, or did I just jump to conclusions? > what is the way to cope with the sizing question? > > Shlomi >