Hi Jason, Yes, I agree the restriction makes the usage of round-robin less flexible. I think the focus of round-robin strategy is workload balance. If different consumers are consuming from different topics, it is unbalanced by nature. In that case, is it possible that you use different consumer group for different sets of topics? The rolling update is a good point. If you do rolling bounce in a small window, the rebalance retry should handle it. But if you want to canary a new topic setting on one consumer for some time, it won’t work. Could you maybe share the use case with more detail? So we can see if there is any workaround.
Jiangjie (Becket) Qin On 3/22/15, 10:04 AM, "Jason Rosenberg" <j...@squareup.com> wrote: >Jiangjie, > >Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til >now >the only one available), is not always good at balancing partitions, as >you >observed above. > >The main thing I'm bringing up in this thread though is the question of >why >there needs to be a restriction to having a homogenous set of consumers in >the group being balanced. This is not a requirement for the range >algorithm, but is for the roundrobin algorithm. So, I'm just wanting to >understand why there's that limitation. (And sadly, in our case, we do >have heterogenous consumers using the same groupid, so we can't easily >turn >on roundrobin at the moment, without some effort :) ). > >I can see that it does simplify the implementation to have that >limitation, >but I'm just wondering if there's anything fundamental that would prevent >an implementation that works over heterogenous consumers. E.g. "Lay out >all partitions, and layout all consumer threads, and proceed round robin >assigning each partition to the next consumer thread. *If the next >consumer >thread doesn't have a selection for the current partition, then move on to >the next consumer-thread...."* > >The current implementation is also problematic if you are doing a rolling >restart of a consumer cluster. Let's say you are updating the topic >selection as part of an update to the cluster. Once the first node is >updated, the entire cluster will no longer be homogenous until the last >node is updated, which means you will have a temporary outage consuming >data until all nodes have been updated. So, it makes it difficult to do >rolling restarts, or canary updates on a subset of nodes, etc. > >Jason > >Jason > >On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin <j...@linkedin.com.invalid> >wrote: > >> Hi Jason, >> >> The motivation behind round robin is to better balance the consumers¹ >> load. Imagine you have two topics each with two partitions. These topics >> are consumed by two consumers each with two consumer threads. >> >> The range assignment gives: >> T1-P1 -> C1-Thr1 >> T1-P2 -> C1-Thr2 >> T2-P1 -> C1-Thr1 >> T2-P2 -> C1-Thr2 >> Consumer 2 will not be consuming from any partitions. >> >> The round robin algorithm gives: >> T1-P1 -> C1-Thr1 >> T1-P2 -> C1-Thr2 >> T2-P1 -> C2-Thr1 >> T2-p2 -> C2-Thr2 >> It is much better than range assignment. >> >> That¹s the reason why we introduced round robin strategy even though it >> has restrictions. >> >> Jiangjie (Becket) Qin >> >> >> On 3/20/15, 12:20 PM, "Jason Rosenberg" <j...@squareup.com> wrote: >> >> >Jiangle, >> > >> >The error messages I got (and the config doc) do clearly state that the >> >number of threads per consumer must match also.... >> > >> >I'm not convinced that an easy to understand algorithm would work fine >> >with >> >a heterogeneous set of selected topics between consumers. >> > >> >Jason >> > >> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat >> ><gharatmayures...@gmail.com >> >> wrote: >> > >> >> Hi Becket, >> >> >> >> Can you list down an example for this. It would be easier to >>understand >> >>:) >> >> >> >> Thanks, >> >> >> >> Mayuresh >> >> >> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin >> >><j...@linkedin.com.invalid> >> >> wrote: >> >> >> >> > Hi Jason, >> >> > >> >> > The round-robin strategy first takes the partitions of all the >>topics >> >>a >> >> > consumer is consuming from, then distributed them across all the >> >> consumers. >> >> > If different consumers are consuming from different topics, the >> >>assigning >> >> > algorithm will generate different answers on different consumers. >> >> > It is OK for consumers to have different thread count, but the >> >>consumers >> >> > have to consume from the same set of topics. >> >> > >> >> > >> >> > For range strategy, the balance is for each individual topic >>instead >> >>of >> >> > cross topics. So the balance is only done for the consumers >>consuming >> >> from >> >> > the same topic. >> >> > >> >> > Thanks. >> >> > >> >> > Jiangjie (Becket) Qin >> >> > >> >> > On 3/19/15, 4:14 PM, "Jason Rosenberg" <j...@squareup.com> wrote: >> >> > >> >> > >So, >> >> > > >> >> > >I've run into an issue migrating a consumer to use the new >> >>'roundrobin' >> >> > >partition.assignment.strategy. It turns out that several of our >> >> consumers >> >> > >use the same group id, but instantiate several different consumer >> >> > >instances >> >> > >(with different topic selectors and thread counts). Often, this >>is >> >>done >> >> > >in >> >> > >a single shared process. It turns out this arrangement is not >> >>allowed >> >> > >when >> >> > >using the 'roundrobin' assignment strategy. >> >> > > >> >> > >I'm curious as to the reason for this restriction? Why is it not >> >>also a >> >> > >restriction for the 'range' strategy (which we've been happily >>using >> >>for >> >> > >some time now)? >> >> > > >> >> > >It would seem that as long as you always assign a partition to a >> >> consumer >> >> > >instance that is actually selecting it, you should still be able >>to >> >> > >proceed >> >> > >with the round-robin algorithm (potentially skipping consumers if >> >>they >> >> > >can't select the next partition in the list, etc.). >> >> > > >> >> > >Jason >> >> > >> >> > >> >> >> >> >> >> -- >> >> -Regards, >> >> Mayuresh R. Gharat >> >> (862) 250-7125 >> >> >> >>