Jiangjie, Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til now the only one available), is not always good at balancing partitions, as you observed above.
The main thing I'm bringing up in this thread though is the question of why there needs to be a restriction to having a homogenous set of consumers in the group being balanced. This is not a requirement for the range algorithm, but is for the roundrobin algorithm. So, I'm just wanting to understand why there's that limitation. (And sadly, in our case, we do have heterogenous consumers using the same groupid, so we can't easily turn on roundrobin at the moment, without some effort :) ). I can see that it does simplify the implementation to have that limitation, but I'm just wondering if there's anything fundamental that would prevent an implementation that works over heterogenous consumers. E.g. "Lay out all partitions, and layout all consumer threads, and proceed round robin assigning each partition to the next consumer thread. *If the next consumer thread doesn't have a selection for the current partition, then move on to the next consumer-thread...."* The current implementation is also problematic if you are doing a rolling restart of a consumer cluster. Let's say you are updating the topic selection as part of an update to the cluster. Once the first node is updated, the entire cluster will no longer be homogenous until the last node is updated, which means you will have a temporary outage consuming data until all nodes have been updated. So, it makes it difficult to do rolling restarts, or canary updates on a subset of nodes, etc. Jason Jason On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hi Jason, > > The motivation behind round robin is to better balance the consumersĀ¹ > load. Imagine you have two topics each with two partitions. These topics > are consumed by two consumers each with two consumer threads. > > The range assignment gives: > T1-P1 -> C1-Thr1 > T1-P2 -> C1-Thr2 > T2-P1 -> C1-Thr1 > T2-P2 -> C1-Thr2 > Consumer 2 will not be consuming from any partitions. > > The round robin algorithm gives: > T1-P1 -> C1-Thr1 > T1-P2 -> C1-Thr2 > T2-P1 -> C2-Thr1 > T2-p2 -> C2-Thr2 > It is much better than range assignment. > > ThatĀ¹s the reason why we introduced round robin strategy even though it > has restrictions. > > Jiangjie (Becket) Qin > > > On 3/20/15, 12:20 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > > >Jiangle, > > > >The error messages I got (and the config doc) do clearly state that the > >number of threads per consumer must match also.... > > > >I'm not convinced that an easy to understand algorithm would work fine > >with > >a heterogeneous set of selected topics between consumers. > > > >Jason > > > >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat > ><gharatmayures...@gmail.com > >> wrote: > > > >> Hi Becket, > >> > >> Can you list down an example for this. It would be easier to understand > >>:) > >> > >> Thanks, > >> > >> Mayuresh > >> > >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin > >><j...@linkedin.com.invalid> > >> wrote: > >> > >> > Hi Jason, > >> > > >> > The round-robin strategy first takes the partitions of all the topics > >>a > >> > consumer is consuming from, then distributed them across all the > >> consumers. > >> > If different consumers are consuming from different topics, the > >>assigning > >> > algorithm will generate different answers on different consumers. > >> > It is OK for consumers to have different thread count, but the > >>consumers > >> > have to consume from the same set of topics. > >> > > >> > > >> > For range strategy, the balance is for each individual topic instead > >>of > >> > cross topics. So the balance is only done for the consumers consuming > >> from > >> > the same topic. > >> > > >> > Thanks. > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > On 3/19/15, 4:14 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > >> > > >> > >So, > >> > > > >> > >I've run into an issue migrating a consumer to use the new > >>'roundrobin' > >> > >partition.assignment.strategy. It turns out that several of our > >> consumers > >> > >use the same group id, but instantiate several different consumer > >> > >instances > >> > >(with different topic selectors and thread counts). Often, this is > >>done > >> > >in > >> > >a single shared process. It turns out this arrangement is not > >>allowed > >> > >when > >> > >using the 'roundrobin' assignment strategy. > >> > > > >> > >I'm curious as to the reason for this restriction? Why is it not > >>also a > >> > >restriction for the 'range' strategy (which we've been happily using > >>for > >> > >some time now)? > >> > > > >> > >It would seem that as long as you always assign a partition to a > >> consumer > >> > >instance that is actually selecting it, you should still be able to > >> > >proceed > >> > >with the round-robin algorithm (potentially skipping consumers if > >>they > >> > >can't select the next partition in the list, etc.). > >> > > > >> > >Jason > >> > > >> > > >> > >> > >> -- > >> -Regards, > >> Mayuresh R. Gharat > >> (862) 250-7125 > >> > >