Hi Guozhang, Yes the clock will be reset not extended. Sorry incorrect wording in the KIP. I'll update it.
Thanks, Damian On Wed, 29 Mar 2017 at 23:18 Guozhang Wang <wangg...@gmail.com> wrote: > Made another pass over the KIP wiki, overall LGTM. One quick question on > the described logic: "they will be added to the group and the delay will be > extended by min(remainingRebalanceTimeout, > group.initial.rebalance.delay.ms)" > though: > > From your previous email I thought you are "resetting the clock" when a new > consumer join group request is received, but it seems to be different. So > suppose the rebalance timeout is very large so it won't be hit generally > (default is 5 min), and delay is set to 3 secs, if the group has 10 members > and we received all their join group request at roughly the same time, or > say they arrived within 1 sec, then "resetting clock" will cause the whole > delay to be no more than 1 + 3 = 4 secs; while extending it will cause it > to be 1 + 3 * 10 = 31 secs? > > > > Guozhang > > > On Wed, Mar 29, 2017 at 3:04 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Thanks Damian! > > > > On Wed, Mar 29, 2017 at 1:27 AM, Damian Guy <damian....@gmail.com> > wrote: > > > >> Thanks everyone for the discussion, very helpful. I've updated the KIP > to > >> make the delay such that it is extended as new members join the group > and > >> that it never exceeds the groups rebalance timeout. > >> > >> If everyone is ok with this I'll kick off the voting thread. > >> > >> Thanks again, > >> Damian > >> > >> On Tue, 28 Mar 2017 at 23:18 Becket Qin <becket....@gmail.com> wrote: > >> > >> > I think separating leave/join makes sense. The scenario I can think of > >> for > >> > delaying a rebalance on LeaveGroupRequest is rolling bounce of a > >> service. > >> > But that scenario could be tricky because there may be mixture of > >> joining > >> > and leaving. What happens if a consumer left the group right after > >> another > >> > consumer joins the group? Which delay should be applied? > >> > > >> > Jason, if I understand correctly, the actual delay of the FIRST > >> rebalance > >> > for each group could be anywhere between > group.initial.rebalance.delay. > >> ms > >> > and > >> > the rebalance timeout, depending on how many times the delay is > applied. > >> > For example, if the delay is set to 3 seconds and rebalance timeout is > >> set > >> > to 10 seconds. At time T a consumer joins the group, the targeting > >> > rebalance point would be T+3 if no other consumer joins. If another > >> > consumer joins the group at T+2 then the targeting delay point would > >> become > >> > T+5, etc. However, no matter how many times the delay was extended, at > >> T+10 > >> > the rebalance will kick off even if at T+9 a new consumer joined the > >> group. > >> > > >> > I also agree that we should set the default delay to some meaningful > >> value > >> > instead of setting it to 0. > >> > > >> > Thanks, > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > On Tue, Mar 28, 2017 at 12:32 PM, Jason Gustafson <ja...@confluent.io > > > >> > wrote: > >> > > >> > > Hey Damian, > >> > > > >> > > Thanks for the KIP. I think the proposal makes sense as a workaround > >> > maybe > >> > > for some advanced users. However, I'm not sure we can depend on > >> average > >> > > users knowing that the config exists, much less setting it to > >> something > >> > > that makes sense. It's kind of a trend in streams that I'm not too > >> > thrilled > >> > > about to try and control these rebalances through careful tuning of > >> > various > >> > > timeouts. For example, the patch to avoid sending LeaveGroup depends > >> on > >> > the > >> > > session timeout being set at least as long as the time for an > average > >> > > rolling restart. If the expectation is that these settings are only > >> > needed > >> > > for advanced users, it may be sufficient, but if the problems are > >> > affecting > >> > > average users, it seems less than ideal. That said, if we can get > some > >> > real > >> > > benefit from low-hanging fruit like this, then it's probably > >> worthwhile. > >> > > > >> > > This relates to the choice of default value, by the way. If we use 0 > >> as > >> > the > >> > > default, my guess is that most users won't change it and the benefit > >> > could > >> > > be marginal. The choice of 3 seconds that you've documented seems > >> fine to > >> > > me. It matches the default consumer heartbeat interval, which > controls > >> > > typical rebalance latency, so there's some consistency there. > >> > > > >> > > Also, one minor comment: I guess the actual delay for each group > will > >> be > >> > > the minimum of the group's rebalance timeout and > >> > > group.initial.rebalance.delay.ms. Is that right? > >> > > > >> > > -Jason > >> > > > >> > > On Tue, Mar 28, 2017 at 8:29 AM, Damian Guy <damian....@gmail.com> > >> > wrote: > >> > > > >> > > > @Ismael - yeah sure we can reduce the default, though i'm not sure > >> what > >> > > the > >> > > > "right" default would be. > >> > > > > >> > > > On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk> > wrote: > >> > > > > >> > > > > Is 3 seconds the right default if the timer gets reset after > each > >> > > > consumer > >> > > > > joins? Maybe we can lower the default value given the new > >> approach. > >> > > > > > >> > > > > Ismael > >> > > > > > >> > > > > On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy < > damian....@gmail.com > >> > > >> > > > wrote: > >> > > > > > >> > > > > > All, > >> > > > > > I'd like to get this back to the original discussion about > >> Delaying > >> > > > > initial > >> > > > > > consumer group rebalance. > >> > > > > > I think i'm leaning towards sticking with the broker config > and > >> > > > changing > >> > > > > > the delay so that the timer starts again when a new consumer > >> joins > >> > > the > >> > > > > > group. What are peoples thoughts on that? > >> > > > > > > >> > > > > > Doing something similar on leave is valid, but i'd prefer to > >> > consider > >> > > > it > >> > > > > > separately from this. > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Damian > >> > > > > > > >> > > > > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com > > > >> > > wrote: > >> > > > > > > >> > > > > > > Matthias, > >> > > > > > > > >> > > > > > > Yes i know. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > Damian > >> > > > > > > > >> > > > > > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax < > >> > > matth...@confluent.io> > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > Damian, > >> > > > > > > > >> > > > > > > about "rebalance immediately" on timeout -- I guess, that's > a > >> > > > different > >> > > > > > > case as no LeaveGroupRequest will be sent. Thus, the broker > >> > should > >> > > be > >> > > > > > > able to distinguish both cases easily, and apply the delay > >> only > >> > if > >> > > it > >> > > > > > > received the LeaveGroupRequest but not if a consumer times > >> out. > >> > > > > > > > >> > > > > > > Does this make sense? > >> > > > > > > > >> > > > > > > -Matthias > >> > > > > > > > >> > > > > > > On 3/27/17 1:56 AM, Damian Guy wrote: > >> > > > > > > > @Becket > >> > > > > > > > > >> > > > > > > > Thanks for the feedback. Yes, i like the idea of extending > >> the > >> > > > delay > >> > > > > as > >> > > > > > > > each new consumer joins the group. Though, i think this > >> could > >> > be > >> > > > done > >> > > > > > > with > >> > > > > > > > either a consumer or broker side config. But i get your > >> point > >> > > that > >> > > > > some > >> > > > > > > > consumers in the group can be misconfigured. > >> > > > > > > > > >> > > > > > > > @Matthias & @Eno - yes we could probably do something > >> similar > >> > if > >> > > > the > >> > > > > > > member > >> > > > > > > > has sent the LeaveGroupRequest. I'm not sure it would be > >> valid > >> > if > >> > > > the > >> > > > > > > > member crashed, hence session.timeout would come into > play, > >> > we'd > >> > > > > > probably > >> > > > > > > > want to rebalance immediately. I'd be interested in > hearing > >> > > > thoughts > >> > > > > > from > >> > > > > > > > other core kafka folks on this one. > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > Damian > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin < > >> becket....@gmail.com> > >> > > > > wrote: > >> > > > > > > > > >> > > > > > > >> Hi Matthias, > >> > > > > > > >> > >> > > > > > > >> Yes, that was what I was thinking. We will keep delay it > >> until > >> > > > > either > >> > > > > > > >> reaching the rebalance timeout or no new consumer joins > in > >> > that > >> > > > > small > >> > > > > > > delay > >> > > > > > > >> which is configured on the broker side. > >> > > > > > > >> > >> > > > > > > >> Thanks, > >> > > > > > > >> > >> > > > > > > >> Jiangjie (Becket) Qin > >> > > > > > > >> > >> > > > > > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax < > >> > > > > > matth...@confluent.io > >> > > > > > > > > >> > > > > > > >> wrote: > >> > > > > > > >> > >> > > > > > > >>> @Becket: > >> > > > > > > >>> > >> > > > > > > >>> I am not sure, if I understand this correctly. Instead > of > >> > > > applying > >> > > > > a > >> > > > > > > >>> fixed delay, that starts when the first consumer of an > >> > (empty) > >> > > > > group > >> > > > > > > >>> joins, you suggest to re-trigger/re-set the delay each > >> time a > >> > > new > >> > > > > > > >>> consumer joins? > >> > > > > > > >>> > >> > > > > > > >>> This sound like a good strategy to me, if the config is > on > >> > the > >> > > > > broker > >> > > > > > > >> side. > >> > > > > > > >>> > >> > > > > > > >>> @Eno: > >> > > > > > > >>> > >> > > > > > > >>> I think that's a valid point and I like this idea! > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> -Matthias > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote: > >> > > > > > > >>>> Thanks Damian, > >> > > > > > > >>>> > >> > > > > > > >>>> This KIP deals with the initial phase only. What about > >> the > >> > > cases > >> > > > > > when > >> > > > > > > >>> several consumers leave a group? Won't there be several > >> > > expensive > >> > > > > > > >>> rebalances then as well? I'm wondering if it makes sense > >> for > >> > > the > >> > > > > > delay > >> > > > > > > to > >> > > > > > > >>> hold anytime the "set" of consumers in a group changes, > >> be it > >> > > > > > addition > >> > > > > > > to > >> > > > > > > >>> the group or removal from group. > >> > > > > > > >>>> > >> > > > > > > >>>> Thanks > >> > > > > > > >>>> Eno > >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin < > >> becket....@gmail.com > >> > > > >> > > > > wrote: > >> > > > > > > >>>>> > >> > > > > > > >>>>> Thanks for the KIP, Damian. > >> > > > > > > >>>>> > >> > > > > > > >>>>> My two cents on this. It seems there are two things > >> worth > >> > > > > thinking > >> > > > > > > >> here: > >> > > > > > > >>>>> > >> > > > > > > >>>>> 1. Better rebalance timing. We will try to rebalance > >> only > >> > > when > >> > > > > all > >> > > > > > > the > >> > > > > > > >>>>> consumers in a group have joined. The challenge would > be > >> > > > someone > >> > > > > > has > >> > > > > > > >> to > >> > > > > > > >>>>> define what does ALL consumers mean, it could either > be > >> a > >> > > time > >> > > > or > >> > > > > > > >>> number of > >> > > > > > > >>>>> consumers, etc. > >> > > > > > > >>>>> > >> > > > > > > >>>>> 2. Avoid frequent rebalance. For example, if there are > >> 100 > >> > > > > > consumers > >> > > > > > > >> in > >> > > > > > > >>> a > >> > > > > > > >>>>> group, today, in the worst case, we may end up with > 100 > >> > > > > rebalances > >> > > > > > > >> even > >> > > > > > > >>> if > >> > > > > > > >>>>> all the consumers joined the group in a reasonably > small > >> > > amount > >> > > > > of > >> > > > > > > >> time. > >> > > > > > > >>>>> Frequent rebalance is also a bad thing for brokers. > >> > > > > > > >>>>> > >> > > > > > > >>>>> Having a client side configuration may solve problem 1 > >> > better > >> > > > > > because > >> > > > > > > >>> each > >> > > > > > > >>>>> consumer group can potentially configure their own > >> timing. > >> > > > > However, > >> > > > > > > it > >> > > > > > > >>> does > >> > > > > > > >>>>> not really prevent frequent rebalance in general > because > >> > some > >> > > > of > >> > > > > > the > >> > > > > > > >>>>> consumers can be misconfigured. (This may have > >> something to > >> > > do > >> > > > > with > >> > > > > > > >>> KIP-124 > >> > > > > > > >>>>> as well. But if quota is applied on the > >> JoinGroup/SyncGroup > >> > > > > request > >> > > > > > > it > >> > > > > > > >>> may > >> > > > > > > >>>>> cause some unwanted cascading effects.) > >> > > > > > > >>>>> > >> > > > > > > >>>>> Having a broker side configuration may result in less > >> > > > flexibility > >> > > > > > for > >> > > > > > > >>> each > >> > > > > > > >>>>> consumer group, but it can prevent frequent rebalance > >> > > better. I > >> > > > > > think > >> > > > > > > >>> with > >> > > > > > > >>>>> some reasonable design, the rebalance timing issue can > >> be > >> > > > > resolved > >> > > > > > on > >> > > > > > > >>> the > >> > > > > > > >>>>> broker side as well. Matthias had a good point on > >> extending > >> > > the > >> > > > > > delay > >> > > > > > > >>> when > >> > > > > > > >>>>> a new consumer joins a group (we actually did > something > >> > > similar > >> > > > > to > >> > > > > > > >> batch > >> > > > > > > >>>>> ISR change propagation). For example, let's say on the > >> > broker > >> > > > > side, > >> > > > > > > we > >> > > > > > > >>> will > >> > > > > > > >>>>> always delay 2 seconds each time we see a new consumer > >> > > joining > >> > > > a > >> > > > > > > >>> consumer > >> > > > > > > >>>>> group. This would probably work for most of the > consumer > >> > > groups > >> > > > > and > >> > > > > > > >> will > >> > > > > > > >>>>> also limit the rebalance frequency to protect the > >> brokers. > >> > > > > > > >>>>> > >> > > > > > > >>>>> I am not sure about the streams use case here, but if > >> > > something > >> > > > > > like > >> > > > > > > 2 > >> > > > > > > >>>>> seconds of delay is acceptable for streams, I would > >> prefer > >> > > > adding > >> > > > > > the > >> > > > > > > >>>>> configuration to the broker so that we can address > both > >> > > > problems. > >> > > > > > > >>>>> > >> > > > > > > >>>>> Thanks, > >> > > > > > > >>>>> > >> > > > > > > >>>>> Jiangjie (Becket) Qin > >> > > > > > > >>>>> > >> > > > > > > >>>>> > >> > > > > > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy < > >> > > > > damian....@gmail.com> > >> > > > > > > >>> wrote: > >> > > > > > > >>>>> > >> > > > > > > >>>>>> Thanks for the feedback. > >> > > > > > > >>>>>> > >> > > > > > > >>>>>> Ewen: I'm happy to make it a client side config. > Other > >> > than > >> > > > the > >> > > > > > > >>> protocol > >> > > > > > > >>>>>> bump i think the effort is almost the same. > Personally > >> i > >> > see > >> > > > no > >> > > > > > > other > >> > > > > > > >>>>>> issues, but based on discussions with others this is > >> what > >> > we > >> > > > > came > >> > > > > > up > >> > > > > > > >>> with. > >> > > > > > > >>>>>> > >> > > > > > > >>>>>> True, it can probably be tested easily via an > >> integration > >> > > > test. > >> > > > > > > >>>>>> > >> > > > > > > >>>>>> Matthias: Yes i agree, the delay could be extended as > >> each > >> > > new > >> > > > > > > member > >> > > > > > > >>> joins > >> > > > > > > >>>>>> the group. > >> > > > > > > >>>>>> > >> > > > > > > >>>>>> Thanks, > >> > > > > > > >>>>>> Damian > >> > > > > > > >>>>>> > >> > > > > > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava < > >> > > > > > > >> e...@confluent.io> > >> > > > > > > >>>>>> wrote: > >> > > > > > > >>>>>> > >> > > > > > > >>>>>>> I have the same initial response as Ismael re: > broker > >> vs > >> > > > > consumer > >> > > > > > > >>>>>> settings. > >> > > > > > > >>>>>>> The global setting seems questionable. > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>> Could we maybe summarize what the impact of making > >> this a > >> > > > > client > >> > > > > > > >>> config > >> > > > > > > >>>>>>> would be? Protocol bump is obvious, but is there any > >> > other > >> > > > > > > >> significant > >> > > > > > > >>>>>>> issue? For the protocol bump in particular, I think > >> this > >> > > > change > >> > > > > > is > >> > > > > > > >>>>>>> currently really critical for streams; it will be > >> > valuable > >> > > > > > > >> elsewhere, > >> > > > > > > >>> but > >> > > > > > > >>>>>>> the immediate demand is streams, so a protocol bump > >> while > >> > > > being > >> > > > > > > >>> backwards > >> > > > > > > >>>>>>> compatible wouldn't affect any other clients. Is > this > >> > still > >> > > > > > > actually > >> > > > > > > >>>>>>> compatible with different clients given that they > >> would > >> > now > >> > > > > > expect > >> > > > > > > >>>>>>> different timeouts? (I think it's strictly > compatible > >> if > >> > > you > >> > > > > wait > >> > > > > > > >> for > >> > > > > > > >>>>>>> responses, but if you enforce any client side > >> timeouts, > >> > I'm > >> > > > not > >> > > > > > so > >> > > > > > > >>> sure.) > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>> re: test plan, I'm sure this will come as a > surprise, > >> but > >> > > is > >> > > > > the > >> > > > > > > >>> system > >> > > > > > > >>>>>>> test even necessary? Validating # of rebalances > seems > >> > messy > >> > > > as > >> > > > > > > other > >> > > > > > > >>>>>> things > >> > > > > > > >>>>>>> can cause rebalances (though admittedly not in a > >> "clean" > >> > > > case). > >> > > > > > But > >> > > > > > > >>>>>> really > >> > > > > > > >>>>>>> it seems like an integration test could validate > this > >> by > >> > > > making > >> > > > > > > sure > >> > > > > > > >>>>>> only 1 > >> > > > > > > >>>>>>> rebalance occurred when 2 members joined with a > >> > sufficient > >> > > > time > >> > > > > > > gap. > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>> -Ewen > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax < > >> > > > > > > >>> matth...@confluent.io> > >> > > > > > > >>>>>>> wrote: > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>>> Thanks for the KIP Damian! > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> My two cents: > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> - we should have an explicit parameter for this -- > >> > > implicit > >> > > > > > > setting > >> > > > > > > >>>>>> are > >> > > > > > > >>>>>>>> always tricky (the "importance" of this parameter > >> would > >> > be > >> > > > > LOW) > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> - the config should be different for each consumer > >> > group: > >> > > > > > > >>>>>>>> * assume you have a stateless app, you want to > >> > rebalance > >> > > > > > > >>> immediately > >> > > > > > > >>>>>>>> * if you start-up in an visualized environment > >> using > >> > > some > >> > > > > > tools > >> > > > > > > >>> like > >> > > > > > > >>>>>>>> Mesos you might need a different value that on bare > >> > metal > >> > > > (no > >> > > > > VM > >> > > > > > > to > >> > > > > > > >>> be > >> > > > > > > >>>>>>>> started) > >> > > > > > > >>>>>>>> * it also depends, how many consumer instanced > you > >> > > expect > >> > > > -- > >> > > > > > > it's > >> > > > > > > >>>>>>>> harder to start up 100 instances in 3 seconds than > 5 > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> - the default value should be zero > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> One more thought: what about scaling scenarios? If > a > >> > > > consumer > >> > > > > > > group > >> > > > > > > >>> has > >> > > > > > > >>>>>>>> 10 instanced and should be scaled up to 20, it > would > >> > make > >> > > > > sense > >> > > > > > to > >> > > > > > > >> do > >> > > > > > > >>>>>>>> this with a single rebalance, too. Thus, I am > >> wondering, > >> > > if > >> > > > it > >> > > > > > > >> would > >> > > > > > > >>>>>>>> make sense to apply this delay each time a new > >> consumer > >> > > > joins > >> > > > > > > >> group, > >> > > > > > > >>>>>>>> even if the group is not empty? > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> -Matthias > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote: > >> > > > > > > >>>>>>>>> Thanks Gouzhang - i think another problem with > this > >> is > >> > > that > >> > > > > is > >> > > > > > > >>>>>>>> overloading > >> > > > > > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm > not > >> > sure > >> > > > > that > >> > > > > > is > >> > > > > > > >> a > >> > > > > > > >>>>>>> good > >> > > > > > > >>>>>>>>> thing. > >> > > > > > > >>>>>>>>> > >> > > > > > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang < > >> > > > > wangg...@gmail.com > >> > > > > > > > >> > > > > > > >>>>>> wrote: > >> > > > > > > >>>>>>>>> > >> > > > > > > >>>>>>>>>> The downside of it, though, is that although it > >> > "hides" > >> > > > this > >> > > > > > > from > >> > > > > > > >>>>>> most > >> > > > > > > >>>>>>>> of > >> > > > > > > >>>>>>>>>> the users needing to be aware of it, by default > >> > session > >> > > > > > timeout > >> > > > > > > >>> i.e. > >> > > > > > > >>>>>>> the > >> > > > > > > >>>>>>>>>> rebalance timeout is 10 seconds which could > >> arguably > >> > too > >> > > > > long. > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> Guozhang > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang < > >> > > > > > > >>> wangg...@gmail.com > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>>>>>> wrote: > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>>> Just throwing another alternative idea here: we > >> can > >> > > > > consider > >> > > > > > > >> using > >> > > > > > > >>>>>>> the > >> > > > > > > >>>>>>>>>>> rebalance timeout value which is already > included > >> in > >> > > the > >> > > > > join > >> > > > > > > >>>>>> request > >> > > > > > > >>>>>>>>>>> protocol (and on the current Java client it is > >> always > >> > > > > written > >> > > > > > > as > >> > > > > > > >>>>>> the > >> > > > > > > >>>>>>>>>>> session timeout value), that the first member > >> joining > >> > > > will > >> > > > > > > >> always > >> > > > > > > >>>>>>> force > >> > > > > > > >>>>>>>>>> the > >> > > > > > > >>>>>>>>>>> coordinator to wait that long. By doing this we > do > >> > not > >> > > > need > >> > > > > > to > >> > > > > > > >>> bump > >> > > > > > > >>>>>>> up > >> > > > > > > >>>>>>>>>> the > >> > > > > > > >>>>>>>>>>> protocol either. > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> Guozhang > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy < > >> > > > > > > >> damian....@gmail.com > >> > > > > > > >>>> > >> > > > > > > >>>>>>>>>> wrote: > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> Hi Ismael, > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> Mostly to avoid the protocol bump. > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> I agree that it may be difficult to choose the > >> right > >> > > > delay > >> > > > > > for > >> > > > > > > >>> all > >> > > > > > > >>>>>>>>>>>> consumer > >> > > > > > > >>>>>>>>>>>> groups, but we wanted to make this something > that > >> > most > >> > > > > users > >> > > > > > > >>> don't > >> > > > > > > >>>>>>>>>> really > >> > > > > > > >>>>>>>>>>>> need to think about, i.e., a small enough > default > >> > > delay > >> > > > > that > >> > > > > > > >>> works > >> > > > > > > >>>>>>> in > >> > > > > > > >>>>>>>>>> the > >> > > > > > > >>>>>>>>>>>> majority of cases. However it would be much > more > >> > > > flexible > >> > > > > > as a > >> > > > > > > >>>>>>>> consumer > >> > > > > > > >>>>>>>>>>>> config, which i'm happy to pursue if this > change > >> is > >> > > > worthy > >> > > > > > of > >> > > > > > > a > >> > > > > > > >>>>>>>> protocol > >> > > > > > > >>>>>>>>>>>> bump. > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> Thanks, > >> > > > > > > >>>>>>>>>>>> Damian > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma < > >> > > > > ism...@juma.me.uk > >> > > > > > > > >> > > > > > > >>>>>> wrote: > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to > >> avoid > >> > > > > > multiple > >> > > > > > > >>>>>>>>>> rebalances > >> > > > > > > >>>>>>>>>>>>> during start-up. One issue with having this > as a > >> > > broker > >> > > > > > > config > >> > > > > > > >>> is > >> > > > > > > >>>>>>>> that > >> > > > > > > >>>>>>>>>>>> it > >> > > > > > > >>>>>>>>>>>>> may be difficult to choose the right delay for > >> all > >> > > > > consumer > >> > > > > > > >>>>>> groups. > >> > > > > > > >>>>>>>>>> Can > >> > > > > > > >>>>>>>>>>>> you > >> > > > > > > >>>>>>>>>>>>> elaborate a little more on why the first > >> > alternative > >> > > > > (add a > >> > > > > > > >>>>>>> consumer > >> > > > > > > >>>>>>>>>>>>> config) was rejected? We bump protocol > versions > >> > > > regularly > >> > > > > > > >> (when > >> > > > > > > >>>>>> it > >> > > > > > > >>>>>>>>>> makes > >> > > > > > > >>>>>>>>>>>>> sense), so it would be good to get a bit more > >> > detail. > >> > > > > > > >>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>> Thanks, > >> > > > > > > >>>>>>>>>>>>> Ismael > >> > > > > > > >>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy < > >> > > > > > > >>>>>> damian....@gmail.com > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>>>>>> wrote: > >> > > > > > > >>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>>> Hi All, > >> > > > > > > >>>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable > >> delay to > >> > > the > >> > > > > > > >> initial > >> > > > > > > >>>>>>>>>>>> consumer > >> > > > > > > >>>>>>>>>>>>>> group rebalance. > >> > > > > > > >>>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>>> Please have look here: > >> > > > > > > >>>>>>>>>>>>>> https://cwiki.apache.org/ > >> > > > confluence/display/KAFKA/KIP- > >> > > > > > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance > >> > > > > > > >>>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>>> Thanks, > >> > > > > > > >>>>>>>>>>>>>> Damian > >> > > > > > > >>>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems > >> the > >> > > > first > >> > > > > > one > >> > > > > > > >> may > >> > > > > > > >>>>>>> have > >> > > > > > > >>>>>>>>>>>> not > >> > > > > > > >>>>>>>>>>>>>> made it. > >> > > > > > > >>>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>>> -- > >> > > > > > > >>>>>>>>>>> -- Guozhang > >> > > > > > > >>>>>>>>>>> > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>>> -- > >> > > > > > > >>>>>>>>>> -- Guozhang > >> > > > > > > >>>>>>>>>> > >> > > > > > > >>>>>>>>> > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>>> > >> > > > > > > >>>>>>> > >> > > > > > > >>>>>> > >> > > > > > > >>>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > > > > > -- > > -- Guozhang > > > > > > -- > -- Guozhang >