Re: partitions stealing & balancing consumer threads across servers

Jun Rao Wed, 29 Oct 2014 18:33:07 -0700

By consumer, I actually mean consumer threads (the thread # you used when
creating consumer streams). So, if you have 4 consumers, each with 4
threads, 4 of the threads will not get any data with 12 partitions. It
sounds like that's not what you get?  What's the output of the
ConsumerOffsetChecker (see http://kafka.apache.org/documentation.html)?


For consumer.id, you don't need to set it in general. We generate some uuid
automatically.

Thanks,

Jun

On Tue, Oct 28, 2014 at 4:59 AM, Shlomi Hazan <shl...@viber.com> wrote:

> Jun,
>
> I hear you say "partitions are evenly distributed among all consumers in
> the same group", yet I did bump into a case where launching a process with
> X high level consumer API threads took over all partitions, sending
> existing consumers to be unemployed.
>
> According to the claim above, and if I am not mistaken:
> on a topic T with 12 partitions and 3 consumers C1-C3 on the same group
> with 4 threads each,
> adding a new consumer C4 with 12 threads should yield the following
> balance:
> C1-C3 each relinquish a single partition holding only 3 partitions each.
> C4 holds the 3 partitions relinquished by C1-C3.
> Yet, in the case I described what happened is that C4 gained all 12
> partitions and sent C1-C3 out of business with 0 partitions each.
> Now maybe I overlooked something but I think I did see that happen.
>
> BTW
> What key is used to distinguish one consumer from another? "consumer.id"?
> docs for "consumer.id" are "Generated automatically if not set."
> What is the best practice for setting it's value? leave empty? is server
> host name good enough? what are the considerations?
> When using the high level consumer API, are all threads identified as the
> same consumer? I guess they are, right?...
>
> Thanks,
> Shlomi
>
>
> On Tue, Oct 28, 2014 at 4:21 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > You can take a look at the "consumer rebalancing algorithm" part in
> > http://kafka.apache.org/documentation.html. Basically, partitions are
> > evenly distributed among all consumers in the same group. If there are
> more
> > consumers in a group than partitions, some consumers will never get any
> > data.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Oct 27, 2014 at 4:14 AM, Shlomi Hazan <shl...@viber.com> wrote:
> >
> > > Hi All,
> > >
> > > Using Kafka's high consumer API I have bumped into a situation where
> > > launching a consumer process P1 with X consuming threads on a topic
> with
> > X
> > > partition kicks out all other existing consumer threads that consumed
> > prior
> > > to launching the process P.
> > > That is, consumer process P is stealing all partitions from all other
> > > consumer processes.
> > >
> > > While understandable, it makes it hard to size & deploy a cluster with
> a
> > > number of partitions that will both allow balancing of consumption
> across
> > > consuming processes, dividing the partitions across consumers by
> setting
> > > each consumer with it's share of the total number of partitions on the
> > > consumed topic, and on the other hand provide room for growth and
> > addition
> > > of new consumers to help with increasing traffic into the cluster and
> the
> > > topic.
> > >
> > > This stealing effect forces me to have more partitions then really
> needed
> > > at the moment, planning for future growth, or stick to what I need and
> > > trust the option to add partitions which comes with a price in terms of
> > > restarting consumers, bumping into out of order messages (hash
> > > partitioning) etc.
> > >
> > > Is this policy of stealing is intended, or did I just jump to
> > conclusions?
> > > what is the way to cope with the sizing question?
> > >
> > > Shlomi
> > >
> >
>

Re: partitions stealing & balancing consumer threads across servers

Reply via email to