Made another pass over the KIP wiki, overall LGTM. One quick question on the described logic: "they will be added to the group and the delay will be extended by min(remainingRebalanceTimeout, group.initial.rebalance.delay.ms)" though:
>From your previous email I thought you are "resetting the clock" when a new consumer join group request is received, but it seems to be different. So suppose the rebalance timeout is very large so it won't be hit generally (default is 5 min), and delay is set to 3 secs, if the group has 10 members and we received all their join group request at roughly the same time, or say they arrived within 1 sec, then "resetting clock" will cause the whole delay to be no more than 1 + 3 = 4 secs; while extending it will cause it to be 1 + 3 * 10 = 31 secs? Guozhang On Wed, Mar 29, 2017 at 3:04 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Thanks Damian! > > On Wed, Mar 29, 2017 at 1:27 AM, Damian Guy <damian....@gmail.com> wrote: > >> Thanks everyone for the discussion, very helpful. I've updated the KIP to >> make the delay such that it is extended as new members join the group and >> that it never exceeds the groups rebalance timeout. >> >> If everyone is ok with this I'll kick off the voting thread. >> >> Thanks again, >> Damian >> >> On Tue, 28 Mar 2017 at 23:18 Becket Qin <becket....@gmail.com> wrote: >> >> > I think separating leave/join makes sense. The scenario I can think of >> for >> > delaying a rebalance on LeaveGroupRequest is rolling bounce of a >> service. >> > But that scenario could be tricky because there may be mixture of >> joining >> > and leaving. What happens if a consumer left the group right after >> another >> > consumer joins the group? Which delay should be applied? >> > >> > Jason, if I understand correctly, the actual delay of the FIRST >> rebalance >> > for each group could be anywhere between group.initial.rebalance.delay. >> ms >> > and >> > the rebalance timeout, depending on how many times the delay is applied. >> > For example, if the delay is set to 3 seconds and rebalance timeout is >> set >> > to 10 seconds. At time T a consumer joins the group, the targeting >> > rebalance point would be T+3 if no other consumer joins. If another >> > consumer joins the group at T+2 then the targeting delay point would >> become >> > T+5, etc. However, no matter how many times the delay was extended, at >> T+10 >> > the rebalance will kick off even if at T+9 a new consumer joined the >> group. >> > >> > I also agree that we should set the default delay to some meaningful >> value >> > instead of setting it to 0. >> > >> > Thanks, >> > >> > Jiangjie (Becket) Qin >> > >> > On Tue, Mar 28, 2017 at 12:32 PM, Jason Gustafson <ja...@confluent.io> >> > wrote: >> > >> > > Hey Damian, >> > > >> > > Thanks for the KIP. I think the proposal makes sense as a workaround >> > maybe >> > > for some advanced users. However, I'm not sure we can depend on >> average >> > > users knowing that the config exists, much less setting it to >> something >> > > that makes sense. It's kind of a trend in streams that I'm not too >> > thrilled >> > > about to try and control these rebalances through careful tuning of >> > various >> > > timeouts. For example, the patch to avoid sending LeaveGroup depends >> on >> > the >> > > session timeout being set at least as long as the time for an average >> > > rolling restart. If the expectation is that these settings are only >> > needed >> > > for advanced users, it may be sufficient, but if the problems are >> > affecting >> > > average users, it seems less than ideal. That said, if we can get some >> > real >> > > benefit from low-hanging fruit like this, then it's probably >> worthwhile. >> > > >> > > This relates to the choice of default value, by the way. If we use 0 >> as >> > the >> > > default, my guess is that most users won't change it and the benefit >> > could >> > > be marginal. The choice of 3 seconds that you've documented seems >> fine to >> > > me. It matches the default consumer heartbeat interval, which controls >> > > typical rebalance latency, so there's some consistency there. >> > > >> > > Also, one minor comment: I guess the actual delay for each group will >> be >> > > the minimum of the group's rebalance timeout and >> > > group.initial.rebalance.delay.ms. Is that right? >> > > >> > > -Jason >> > > >> > > On Tue, Mar 28, 2017 at 8:29 AM, Damian Guy <damian....@gmail.com> >> > wrote: >> > > >> > > > @Ismael - yeah sure we can reduce the default, though i'm not sure >> what >> > > the >> > > > "right" default would be. >> > > > >> > > > On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk> wrote: >> > > > >> > > > > Is 3 seconds the right default if the timer gets reset after each >> > > > consumer >> > > > > joins? Maybe we can lower the default value given the new >> approach. >> > > > > >> > > > > Ismael >> > > > > >> > > > > On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy <damian....@gmail.com >> > >> > > > wrote: >> > > > > >> > > > > > All, >> > > > > > I'd like to get this back to the original discussion about >> Delaying >> > > > > initial >> > > > > > consumer group rebalance. >> > > > > > I think i'm leaning towards sticking with the broker config and >> > > > changing >> > > > > > the delay so that the timer starts again when a new consumer >> joins >> > > the >> > > > > > group. What are peoples thoughts on that? >> > > > > > >> > > > > > Doing something similar on leave is valid, but i'd prefer to >> > consider >> > > > it >> > > > > > separately from this. >> > > > > > >> > > > > > Thanks, >> > > > > > Damian >> > > > > > >> > > > > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com> >> > > wrote: >> > > > > > >> > > > > > > Matthias, >> > > > > > > >> > > > > > > Yes i know. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Damian >> > > > > > > >> > > > > > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax < >> > > matth...@confluent.io> >> > > > > > > wrote: >> > > > > > > >> > > > > > > Damian, >> > > > > > > >> > > > > > > about "rebalance immediately" on timeout -- I guess, that's a >> > > > different >> > > > > > > case as no LeaveGroupRequest will be sent. Thus, the broker >> > should >> > > be >> > > > > > > able to distinguish both cases easily, and apply the delay >> only >> > if >> > > it >> > > > > > > received the LeaveGroupRequest but not if a consumer times >> out. >> > > > > > > >> > > > > > > Does this make sense? >> > > > > > > >> > > > > > > -Matthias >> > > > > > > >> > > > > > > On 3/27/17 1:56 AM, Damian Guy wrote: >> > > > > > > > @Becket >> > > > > > > > >> > > > > > > > Thanks for the feedback. Yes, i like the idea of extending >> the >> > > > delay >> > > > > as >> > > > > > > > each new consumer joins the group. Though, i think this >> could >> > be >> > > > done >> > > > > > > with >> > > > > > > > either a consumer or broker side config. But i get your >> point >> > > that >> > > > > some >> > > > > > > > consumers in the group can be misconfigured. >> > > > > > > > >> > > > > > > > @Matthias & @Eno - yes we could probably do something >> similar >> > if >> > > > the >> > > > > > > member >> > > > > > > > has sent the LeaveGroupRequest. I'm not sure it would be >> valid >> > if >> > > > the >> > > > > > > > member crashed, hence session.timeout would come into play, >> > we'd >> > > > > > probably >> > > > > > > > want to rebalance immediately. I'd be interested in hearing >> > > > thoughts >> > > > > > from >> > > > > > > > other core kafka folks on this one. >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Damian >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin < >> becket....@gmail.com> >> > > > > wrote: >> > > > > > > > >> > > > > > > >> Hi Matthias, >> > > > > > > >> >> > > > > > > >> Yes, that was what I was thinking. We will keep delay it >> until >> > > > > either >> > > > > > > >> reaching the rebalance timeout or no new consumer joins in >> > that >> > > > > small >> > > > > > > delay >> > > > > > > >> which is configured on the broker side. >> > > > > > > >> >> > > > > > > >> Thanks, >> > > > > > > >> >> > > > > > > >> Jiangjie (Becket) Qin >> > > > > > > >> >> > > > > > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax < >> > > > > > matth...@confluent.io >> > > > > > > > >> > > > > > > >> wrote: >> > > > > > > >> >> > > > > > > >>> @Becket: >> > > > > > > >>> >> > > > > > > >>> I am not sure, if I understand this correctly. Instead of >> > > > applying >> > > > > a >> > > > > > > >>> fixed delay, that starts when the first consumer of an >> > (empty) >> > > > > group >> > > > > > > >>> joins, you suggest to re-trigger/re-set the delay each >> time a >> > > new >> > > > > > > >>> consumer joins? >> > > > > > > >>> >> > > > > > > >>> This sound like a good strategy to me, if the config is on >> > the >> > > > > broker >> > > > > > > >> side. >> > > > > > > >>> >> > > > > > > >>> @Eno: >> > > > > > > >>> >> > > > > > > >>> I think that's a valid point and I like this idea! >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> -Matthias >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote: >> > > > > > > >>>> Thanks Damian, >> > > > > > > >>>> >> > > > > > > >>>> This KIP deals with the initial phase only. What about >> the >> > > cases >> > > > > > when >> > > > > > > >>> several consumers leave a group? Won't there be several >> > > expensive >> > > > > > > >>> rebalances then as well? I'm wondering if it makes sense >> for >> > > the >> > > > > > delay >> > > > > > > to >> > > > > > > >>> hold anytime the "set" of consumers in a group changes, >> be it >> > > > > > addition >> > > > > > > to >> > > > > > > >>> the group or removal from group. >> > > > > > > >>>> >> > > > > > > >>>> Thanks >> > > > > > > >>>> Eno >> > > > > > > >>>> >> > > > > > > >>>> >> > > > > > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin < >> becket....@gmail.com >> > > >> > > > > wrote: >> > > > > > > >>>>> >> > > > > > > >>>>> Thanks for the KIP, Damian. >> > > > > > > >>>>> >> > > > > > > >>>>> My two cents on this. It seems there are two things >> worth >> > > > > thinking >> > > > > > > >> here: >> > > > > > > >>>>> >> > > > > > > >>>>> 1. Better rebalance timing. We will try to rebalance >> only >> > > when >> > > > > all >> > > > > > > the >> > > > > > > >>>>> consumers in a group have joined. The challenge would be >> > > > someone >> > > > > > has >> > > > > > > >> to >> > > > > > > >>>>> define what does ALL consumers mean, it could either be >> a >> > > time >> > > > or >> > > > > > > >>> number of >> > > > > > > >>>>> consumers, etc. >> > > > > > > >>>>> >> > > > > > > >>>>> 2. Avoid frequent rebalance. For example, if there are >> 100 >> > > > > > consumers >> > > > > > > >> in >> > > > > > > >>> a >> > > > > > > >>>>> group, today, in the worst case, we may end up with 100 >> > > > > rebalances >> > > > > > > >> even >> > > > > > > >>> if >> > > > > > > >>>>> all the consumers joined the group in a reasonably small >> > > amount >> > > > > of >> > > > > > > >> time. >> > > > > > > >>>>> Frequent rebalance is also a bad thing for brokers. >> > > > > > > >>>>> >> > > > > > > >>>>> Having a client side configuration may solve problem 1 >> > better >> > > > > > because >> > > > > > > >>> each >> > > > > > > >>>>> consumer group can potentially configure their own >> timing. >> > > > > However, >> > > > > > > it >> > > > > > > >>> does >> > > > > > > >>>>> not really prevent frequent rebalance in general because >> > some >> > > > of >> > > > > > the >> > > > > > > >>>>> consumers can be misconfigured. (This may have >> something to >> > > do >> > > > > with >> > > > > > > >>> KIP-124 >> > > > > > > >>>>> as well. But if quota is applied on the >> JoinGroup/SyncGroup >> > > > > request >> > > > > > > it >> > > > > > > >>> may >> > > > > > > >>>>> cause some unwanted cascading effects.) >> > > > > > > >>>>> >> > > > > > > >>>>> Having a broker side configuration may result in less >> > > > flexibility >> > > > > > for >> > > > > > > >>> each >> > > > > > > >>>>> consumer group, but it can prevent frequent rebalance >> > > better. I >> > > > > > think >> > > > > > > >>> with >> > > > > > > >>>>> some reasonable design, the rebalance timing issue can >> be >> > > > > resolved >> > > > > > on >> > > > > > > >>> the >> > > > > > > >>>>> broker side as well. Matthias had a good point on >> extending >> > > the >> > > > > > delay >> > > > > > > >>> when >> > > > > > > >>>>> a new consumer joins a group (we actually did something >> > > similar >> > > > > to >> > > > > > > >> batch >> > > > > > > >>>>> ISR change propagation). For example, let's say on the >> > broker >> > > > > side, >> > > > > > > we >> > > > > > > >>> will >> > > > > > > >>>>> always delay 2 seconds each time we see a new consumer >> > > joining >> > > > a >> > > > > > > >>> consumer >> > > > > > > >>>>> group. This would probably work for most of the consumer >> > > groups >> > > > > and >> > > > > > > >> will >> > > > > > > >>>>> also limit the rebalance frequency to protect the >> brokers. >> > > > > > > >>>>> >> > > > > > > >>>>> I am not sure about the streams use case here, but if >> > > something >> > > > > > like >> > > > > > > 2 >> > > > > > > >>>>> seconds of delay is acceptable for streams, I would >> prefer >> > > > adding >> > > > > > the >> > > > > > > >>>>> configuration to the broker so that we can address both >> > > > problems. >> > > > > > > >>>>> >> > > > > > > >>>>> Thanks, >> > > > > > > >>>>> >> > > > > > > >>>>> Jiangjie (Becket) Qin >> > > > > > > >>>>> >> > > > > > > >>>>> >> > > > > > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy < >> > > > > damian....@gmail.com> >> > > > > > > >>> wrote: >> > > > > > > >>>>> >> > > > > > > >>>>>> Thanks for the feedback. >> > > > > > > >>>>>> >> > > > > > > >>>>>> Ewen: I'm happy to make it a client side config. Other >> > than >> > > > the >> > > > > > > >>> protocol >> > > > > > > >>>>>> bump i think the effort is almost the same. Personally >> i >> > see >> > > > no >> > > > > > > other >> > > > > > > >>>>>> issues, but based on discussions with others this is >> what >> > we >> > > > > came >> > > > > > up >> > > > > > > >>> with. >> > > > > > > >>>>>> >> > > > > > > >>>>>> True, it can probably be tested easily via an >> integration >> > > > test. >> > > > > > > >>>>>> >> > > > > > > >>>>>> Matthias: Yes i agree, the delay could be extended as >> each >> > > new >> > > > > > > member >> > > > > > > >>> joins >> > > > > > > >>>>>> the group. >> > > > > > > >>>>>> >> > > > > > > >>>>>> Thanks, >> > > > > > > >>>>>> Damian >> > > > > > > >>>>>> >> > > > > > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava < >> > > > > > > >> e...@confluent.io> >> > > > > > > >>>>>> wrote: >> > > > > > > >>>>>> >> > > > > > > >>>>>>> I have the same initial response as Ismael re: broker >> vs >> > > > > consumer >> > > > > > > >>>>>> settings. >> > > > > > > >>>>>>> The global setting seems questionable. >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> Could we maybe summarize what the impact of making >> this a >> > > > > client >> > > > > > > >>> config >> > > > > > > >>>>>>> would be? Protocol bump is obvious, but is there any >> > other >> > > > > > > >> significant >> > > > > > > >>>>>>> issue? For the protocol bump in particular, I think >> this >> > > > change >> > > > > > is >> > > > > > > >>>>>>> currently really critical for streams; it will be >> > valuable >> > > > > > > >> elsewhere, >> > > > > > > >>> but >> > > > > > > >>>>>>> the immediate demand is streams, so a protocol bump >> while >> > > > being >> > > > > > > >>> backwards >> > > > > > > >>>>>>> compatible wouldn't affect any other clients. Is this >> > still >> > > > > > > actually >> > > > > > > >>>>>>> compatible with different clients given that they >> would >> > now >> > > > > > expect >> > > > > > > >>>>>>> different timeouts? (I think it's strictly compatible >> if >> > > you >> > > > > wait >> > > > > > > >> for >> > > > > > > >>>>>>> responses, but if you enforce any client side >> timeouts, >> > I'm >> > > > not >> > > > > > so >> > > > > > > >>> sure.) >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> re: test plan, I'm sure this will come as a surprise, >> but >> > > is >> > > > > the >> > > > > > > >>> system >> > > > > > > >>>>>>> test even necessary? Validating # of rebalances seems >> > messy >> > > > as >> > > > > > > other >> > > > > > > >>>>>> things >> > > > > > > >>>>>>> can cause rebalances (though admittedly not in a >> "clean" >> > > > case). >> > > > > > But >> > > > > > > >>>>>> really >> > > > > > > >>>>>>> it seems like an integration test could validate this >> by >> > > > making >> > > > > > > sure >> > > > > > > >>>>>> only 1 >> > > > > > > >>>>>>> rebalance occurred when 2 members joined with a >> > sufficient >> > > > time >> > > > > > > gap. >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> -Ewen >> > > > > > > >>>>>>> >> > > > > > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax < >> > > > > > > >>> matth...@confluent.io> >> > > > > > > >>>>>>> wrote: >> > > > > > > >>>>>>> >> > > > > > > >>>>>>>> Thanks for the KIP Damian! >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> My two cents: >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> - we should have an explicit parameter for this -- >> > > implicit >> > > > > > > setting >> > > > > > > >>>>>> are >> > > > > > > >>>>>>>> always tricky (the "importance" of this parameter >> would >> > be >> > > > > LOW) >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> - the config should be different for each consumer >> > group: >> > > > > > > >>>>>>>> * assume you have a stateless app, you want to >> > rebalance >> > > > > > > >>> immediately >> > > > > > > >>>>>>>> * if you start-up in an visualized environment >> using >> > > some >> > > > > > tools >> > > > > > > >>> like >> > > > > > > >>>>>>>> Mesos you might need a different value that on bare >> > metal >> > > > (no >> > > > > VM >> > > > > > > to >> > > > > > > >>> be >> > > > > > > >>>>>>>> started) >> > > > > > > >>>>>>>> * it also depends, how many consumer instanced you >> > > expect >> > > > -- >> > > > > > > it's >> > > > > > > >>>>>>>> harder to start up 100 instances in 3 seconds than 5 >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> - the default value should be zero >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> One more thought: what about scaling scenarios? If a >> > > > consumer >> > > > > > > group >> > > > > > > >>> has >> > > > > > > >>>>>>>> 10 instanced and should be scaled up to 20, it would >> > make >> > > > > sense >> > > > > > to >> > > > > > > >> do >> > > > > > > >>>>>>>> this with a single rebalance, too. Thus, I am >> wondering, >> > > if >> > > > it >> > > > > > > >> would >> > > > > > > >>>>>>>> make sense to apply this delay each time a new >> consumer >> > > > joins >> > > > > > > >> group, >> > > > > > > >>>>>>>> even if the group is not empty? >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> -Matthias >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote: >> > > > > > > >>>>>>>>> Thanks Gouzhang - i think another problem with this >> is >> > > that >> > > > > is >> > > > > > > >>>>>>>> overloading >> > > > > > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm not >> > sure >> > > > > that >> > > > > > is >> > > > > > > >> a >> > > > > > > >>>>>>> good >> > > > > > > >>>>>>>>> thing. >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang < >> > > > > wangg...@gmail.com >> > > > > > > >> > > > > > > >>>>>> wrote: >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>>>> The downside of it, though, is that although it >> > "hides" >> > > > this >> > > > > > > from >> > > > > > > >>>>>> most >> > > > > > > >>>>>>>> of >> > > > > > > >>>>>>>>>> the users needing to be aware of it, by default >> > session >> > > > > > timeout >> > > > > > > >>> i.e. >> > > > > > > >>>>>>> the >> > > > > > > >>>>>>>>>> rebalance timeout is 10 seconds which could >> arguably >> > too >> > > > > long. >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> Guozhang >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang < >> > > > > > > >>> wangg...@gmail.com >> > > > > > > >>>>>>> >> > > > > > > >>>>>>>>>> wrote: >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>>> Just throwing another alternative idea here: we >> can >> > > > > consider >> > > > > > > >> using >> > > > > > > >>>>>>> the >> > > > > > > >>>>>>>>>>> rebalance timeout value which is already included >> in >> > > the >> > > > > join >> > > > > > > >>>>>> request >> > > > > > > >>>>>>>>>>> protocol (and on the current Java client it is >> always >> > > > > written >> > > > > > > as >> > > > > > > >>>>>> the >> > > > > > > >>>>>>>>>>> session timeout value), that the first member >> joining >> > > > will >> > > > > > > >> always >> > > > > > > >>>>>>> force >> > > > > > > >>>>>>>>>> the >> > > > > > > >>>>>>>>>>> coordinator to wait that long. By doing this we do >> > not >> > > > need >> > > > > > to >> > > > > > > >>> bump >> > > > > > > >>>>>>> up >> > > > > > > >>>>>>>>>> the >> > > > > > > >>>>>>>>>>> protocol either. >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> Guozhang >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy < >> > > > > > > >> damian....@gmail.com >> > > > > > > >>>> >> > > > > > > >>>>>>>>>> wrote: >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> Hi Ismael, >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> Mostly to avoid the protocol bump. >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> I agree that it may be difficult to choose the >> right >> > > > delay >> > > > > > for >> > > > > > > >>> all >> > > > > > > >>>>>>>>>>>> consumer >> > > > > > > >>>>>>>>>>>> groups, but we wanted to make this something that >> > most >> > > > > users >> > > > > > > >>> don't >> > > > > > > >>>>>>>>>> really >> > > > > > > >>>>>>>>>>>> need to think about, i.e., a small enough default >> > > delay >> > > > > that >> > > > > > > >>> works >> > > > > > > >>>>>>> in >> > > > > > > >>>>>>>>>> the >> > > > > > > >>>>>>>>>>>> majority of cases. However it would be much more >> > > > flexible >> > > > > > as a >> > > > > > > >>>>>>>> consumer >> > > > > > > >>>>>>>>>>>> config, which i'm happy to pursue if this change >> is >> > > > worthy >> > > > > > of >> > > > > > > a >> > > > > > > >>>>>>>> protocol >> > > > > > > >>>>>>>>>>>> bump. >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> Thanks, >> > > > > > > >>>>>>>>>>>> Damian >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma < >> > > > > ism...@juma.me.uk >> > > > > > > >> > > > > > > >>>>>> wrote: >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to >> avoid >> > > > > > multiple >> > > > > > > >>>>>>>>>> rebalances >> > > > > > > >>>>>>>>>>>>> during start-up. One issue with having this as a >> > > broker >> > > > > > > config >> > > > > > > >>> is >> > > > > > > >>>>>>>> that >> > > > > > > >>>>>>>>>>>> it >> > > > > > > >>>>>>>>>>>>> may be difficult to choose the right delay for >> all >> > > > > consumer >> > > > > > > >>>>>> groups. >> > > > > > > >>>>>>>>>> Can >> > > > > > > >>>>>>>>>>>> you >> > > > > > > >>>>>>>>>>>>> elaborate a little more on why the first >> > alternative >> > > > > (add a >> > > > > > > >>>>>>> consumer >> > > > > > > >>>>>>>>>>>>> config) was rejected? We bump protocol versions >> > > > regularly >> > > > > > > >> (when >> > > > > > > >>>>>> it >> > > > > > > >>>>>>>>>> makes >> > > > > > > >>>>>>>>>>>>> sense), so it would be good to get a bit more >> > detail. >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> Thanks, >> > > > > > > >>>>>>>>>>>>> Ismael >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy < >> > > > > > > >>>>>> damian....@gmail.com >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>>>>>> wrote: >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>>> Hi All, >> > > > > > > >>>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable >> delay to >> > > the >> > > > > > > >> initial >> > > > > > > >>>>>>>>>>>> consumer >> > > > > > > >>>>>>>>>>>>>> group rebalance. >> > > > > > > >>>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>>> Please have look here: >> > > > > > > >>>>>>>>>>>>>> https://cwiki.apache.org/ >> > > > confluence/display/KAFKA/KIP- >> > > > > > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance >> > > > > > > >>>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>>> Thanks, >> > > > > > > >>>>>>>>>>>>>> Damian >> > > > > > > >>>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems >> the >> > > > first >> > > > > > one >> > > > > > > >> may >> > > > > > > >>>>>>> have >> > > > > > > >>>>>>>>>>>> not >> > > > > > > >>>>>>>>>>>>>> made it. >> > > > > > > >>>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>>> >> > > > > > > >>>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>>> -- >> > > > > > > >>>>>>>>>>> -- Guozhang >> > > > > > > >>>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>>> -- >> > > > > > > >>>>>>>>>> -- Guozhang >> > > > > > > >>>>>>>>>> >> > > > > > > >>>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>>> >> > > > > > > >>>>>>> >> > > > > > > >>>>>> >> > > > > > > >>>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > > > -- > -- Guozhang > -- -- Guozhang