HI Joel,

Correction to my previous question:   What is expected behavior of *roundrobin
*policy above scenario  ?

Thanks,

Bhavesh

On Thu, Oct 30, 2014 at 1:39 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com>
wrote:

> Hi Joel,
>
> I have similar issue.  I have tried *partition.assignment.strategy=*
> *"roundrobin"*, but how do you accept this accept to work ?
>
> We have a topic with 32 partitions and 4 JVM with 10 threads each ( 8 is
> backup if one of JVM goes down).   The roundrobin does not select all the
> JVM only 3 JVM but uneven distribution of threads across 4 JVMs (the 4th
> JVM does not get any active consumption threads).  What is best way to
> evenly (or close to even)  distribute the consumption threads  across JVMs.
>
>
> Thanks,
>
> Bhavesh
>
> On Thu, Oct 30, 2014 at 10:07 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
>
>>
>> > example: launching 4 processes on 4 different machines with 4 threads
>> per
>> > process on 12 partition topic will have each machine with 3 assigned
>> > threads and one doing nothing. more over no matter what number of
>> threads
>> > each process will have , as long as it is bigger then 3, the end result
>> > will stay the same with 3 assigned threads per machine, and the rest of
>> > them doing nothing.
>> >
>> > Ideally, I would want something like consumer set/ensemble/{what ever
>> word
>> > not group} that will be used to denote a group of threads on a machine,
>> > so that when specific threads request to join a consumer group they
>> will be
>> > elected so that they are balanced across the machine denoted by the
>> > consumer set/ensemble identifier.
>> >
>> > will partition.assignment.strategy="roundrobin" help with that?
>>
>> You can give it a try. It does have the constraint the subscription is
>> identical across your machines. i.e., it should work in your above
>> scenario (4 process on 4 machines with 4 threads per process). The
>> partition assignment will assign partitions to threads in a
>> round-robin manner. The difference of max(owned) and min(owned) will
>> be exactly one.
>>
>> We can discuss improving partition assignment strategies in the 0.9
>> release with the new consumer.
>>
>> On Thu, Oct 30, 2014 at 08:52:40AM +0200, Shlomi Hazan wrote:
>> > Jun, Joel,
>> >
>> > The issue here is exactly which threads are left out, and which threads
>> are
>> > assigned partitions.
>> > Maybe I am missing something but what I want is to balance consuming
>> > threads across machines/processes, regardless of the amount of threads
>> the
>> > machine launches (side effect: this way if you have more threads than
>> > partitions you get a reserve force awaiting to charge in).
>> >
>> > example: launching 4 processes on 4 different machines with 4 threads
>> per
>> > process on 12 partition topic will have each machine with 3 assigned
>> > threads and one doing nothing. more over no matter what number of
>> threads
>> > each process will have , as long as it is bigger then 3, the end result
>> > will stay the same with 3 assigned threads per machine, and the rest of
>> > them doing nothing.
>> >
>> > Ideally, I would want something like consumer set/ensemble/{what ever
>> word
>> > not group} that will be used to denote a group of threads on a machine,
>> > so that when specific threads request to join a consumer group they
>> will be
>> > elected so that they are balanced across the machine denoted by the
>> > consumer set/ensemble identifier.
>> >
>> > will partition.assignment.strategy="roundrobin" help with that?
>> > 10x,
>> > Shlomi
>> >
>> > On Thu, Oct 30, 2014 at 4:00 AM, Joel Koshy <jjkosh...@gmail.com>
>> wrote:
>> >
>> > > Shlomi,
>> > >
>> > > If you are on trunk, and your consumer subscriptions are identical
>> > > then you can try a slightly different partition assignment strategy.
>> > > Try setting partition.assignment.strategy="roundrobin" in your
>> > > consumer config.
>> > >
>> > > Thanks,
>> > >
>> > > Joel
>> > >
>> > > On Wed, Oct 29, 2014 at 06:29:30PM -0700, Jun Rao wrote:
>> > > > By consumer, I actually mean consumer threads (the thread # you
>> used when
>> > > > creating consumer streams). So, if you have 4 consumers, each with 4
>> > > > threads, 4 of the threads will not get any data with 12 partitions.
>> It
>> > > > sounds like that's not what you get?  What's the output of the
>> > > > ConsumerOffsetChecker (see
>> http://kafka.apache.org/documentation.html)?
>> > > >
>> > > > For consumer.id, you don't need to set it in general. We generate
>> some
>> > > uuid
>> > > > automatically.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Tue, Oct 28, 2014 at 4:59 AM, Shlomi Hazan <shl...@viber.com>
>> wrote:
>> > > >
>> > > > > Jun,
>> > > > >
>> > > > > I hear you say "partitions are evenly distributed among all
>> consumers
>> > > in
>> > > > > the same group", yet I did bump into a case where launching a
>> process
>> > > with
>> > > > > X high level consumer API threads took over all partitions,
>> sending
>> > > > > existing consumers to be unemployed.
>> > > > >
>> > > > > According to the claim above, and if I am not mistaken:
>> > > > > on a topic T with 12 partitions and 3 consumers C1-C3 on the same
>> group
>> > > > > with 4 threads each,
>> > > > > adding a new consumer C4 with 12 threads should yield the
>> following
>> > > > > balance:
>> > > > > C1-C3 each relinquish a single partition holding only 3 partitions
>> > > each.
>> > > > > C4 holds the 3 partitions relinquished by C1-C3.
>> > > > > Yet, in the case I described what happened is that C4 gained all
>> 12
>> > > > > partitions and sent C1-C3 out of business with 0 partitions each.
>> > > > > Now maybe I overlooked something but I think I did see that
>> happen.
>> > > > >
>> > > > > BTW
>> > > > > What key is used to distinguish one consumer from another? "
>> > > consumer.id"?
>> > > > > docs for "consumer.id" are "Generated automatically if not set."
>> > > > > What is the best practice for setting it's value? leave empty? is
>> > > server
>> > > > > host name good enough? what are the considerations?
>> > > > > When using the high level consumer API, are all threads
>> identified as
>> > > the
>> > > > > same consumer? I guess they are, right?...
>> > > > >
>> > > > > Thanks,
>> > > > > Shlomi
>> > > > >
>> > > > >
>> > > > > On Tue, Oct 28, 2014 at 4:21 AM, Jun Rao <jun...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > You can take a look at the "consumer rebalancing algorithm"
>> part in
>> > > > > > http://kafka.apache.org/documentation.html. Basically,
>> partitions
>> > > are
>> > > > > > evenly distributed among all consumers in the same group. If
>> there
>> > > are
>> > > > > more
>> > > > > > consumers in a group than partitions, some consumers will never
>> get
>> > > any
>> > > > > > data.
>> > > > > >
>> > > > > > Thanks,
>> > > > > >
>> > > > > > Jun
>> > > > > >
>> > > > > > On Mon, Oct 27, 2014 at 4:14 AM, Shlomi Hazan <shl...@viber.com
>> >
>> > > wrote:
>> > > > > >
>> > > > > > > Hi All,
>> > > > > > >
>> > > > > > > Using Kafka's high consumer API I have bumped into a situation
>> > > where
>> > > > > > > launching a consumer process P1 with X consuming threads on a
>> topic
>> > > > > with
>> > > > > > X
>> > > > > > > partition kicks out all other existing consumer threads that
>> > > consumed
>> > > > > > prior
>> > > > > > > to launching the process P.
>> > > > > > > That is, consumer process P is stealing all partitions from
>> all
>> > > other
>> > > > > > > consumer processes.
>> > > > > > >
>> > > > > > > While understandable, it makes it hard to size & deploy a
>> cluster
>> > > with
>> > > > > a
>> > > > > > > number of partitions that will both allow balancing of
>> consumption
>> > > > > across
>> > > > > > > consuming processes, dividing the partitions across consumers
>> by
>> > > > > setting
>> > > > > > > each consumer with it's share of the total number of
>> partitions on
>> > > the
>> > > > > > > consumed topic, and on the other hand provide room for growth
>> and
>> > > > > > addition
>> > > > > > > of new consumers to help with increasing traffic into the
>> cluster
>> > > and
>> > > > > the
>> > > > > > > topic.
>> > > > > > >
>> > > > > > > This stealing effect forces me to have more partitions then
>> really
>> > > > > needed
>> > > > > > > at the moment, planning for future growth, or stick to what I
>> need
>> > > and
>> > > > > > > trust the option to add partitions which comes with a price in
>> > > terms of
>> > > > > > > restarting consumers, bumping into out of order messages (hash
>> > > > > > > partitioning) etc.
>> > > > > > >
>> > > > > > > Is this policy of stealing is intended, or did I just jump to
>> > > > > > conclusions?
>> > > > > > > what is the way to cope with the sizing question?
>> > > > > > >
>> > > > > > > Shlomi
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> > >
>>
>>
>

Reply via email to