Just clarifying on "session.timeout.ms": today we already have a rebalance.timeout value in the JoinGroupRequest protocol, which is used to determine how long the coordinator will wait for each consumer to re-join the group during prepare-rebalance phase; and I was thinking we can use that value for the initial delay, so that even if later we decided to make this rebalance timeout value configurable on the client-side (today it is hard written as max poll timeout on the Java client code, default 5min, not the session timeout value, default 10sec: I was wrong before) we can do that without bumping up the protocol.
Now thinking about it a bit more, I feel the main problem in around the default value settings since it is 5min which is definitely not acceptable. On the other hand, I feel that keeping it as a broker-side config may be sufficient since this config is only to be used for the first time ever (i.e. not for re-joining, which is controlled by the rebalance timeout value), so it only depends on how quickly a consumer app would be starting all of its instances at the same time, which I think would be most likely universal instead of one-spec-per-app. I like the idea of resetting the timer when there are new consumer joining for the initial delay. So I'd suggest we just keep it as a broker side config with resetting timer on new consumer joining, and do NOT extend it for leave group as well for the similar motivations above: their expected time values would be different (3-5 seconds for initial delay, while tens of seconds for the leave group delay expecting other instances will also leave or the leaving instance will re-join soon). Guozhang On Tue, Mar 28, 2017 at 8:29 AM, Damian Guy <damian....@gmail.com> wrote: > @Ismael - yeah sure we can reduce the default, though i'm not sure what the > "right" default would be. > > On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk> wrote: > > > Is 3 seconds the right default if the timer gets reset after each > consumer > > joins? Maybe we can lower the default value given the new approach. > > > > Ismael > > > > On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy <damian....@gmail.com> > wrote: > > > > > All, > > > I'd like to get this back to the original discussion about Delaying > > initial > > > consumer group rebalance. > > > I think i'm leaning towards sticking with the broker config and > changing > > > the delay so that the timer starts again when a new consumer joins the > > > group. What are peoples thoughts on that? > > > > > > Doing something similar on leave is valid, but i'd prefer to consider > it > > > separately from this. > > > > > > Thanks, > > > Damian > > > > > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com> wrote: > > > > > > > Matthias, > > > > > > > > Yes i know. > > > > > > > > Thanks, > > > > Damian > > > > > > > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <matth...@confluent.io> > > > > wrote: > > > > > > > > Damian, > > > > > > > > about "rebalance immediately" on timeout -- I guess, that's a > different > > > > case as no LeaveGroupRequest will be sent. Thus, the broker should be > > > > able to distinguish both cases easily, and apply the delay only if it > > > > received the LeaveGroupRequest but not if a consumer times out. > > > > > > > > Does this make sense? > > > > > > > > -Matthias > > > > > > > > On 3/27/17 1:56 AM, Damian Guy wrote: > > > > > @Becket > > > > > > > > > > Thanks for the feedback. Yes, i like the idea of extending the > delay > > as > > > > > each new consumer joins the group. Though, i think this could be > done > > > > with > > > > > either a consumer or broker side config. But i get your point that > > some > > > > > consumers in the group can be misconfigured. > > > > > > > > > > @Matthias & @Eno - yes we could probably do something similar if > the > > > > member > > > > > has sent the LeaveGroupRequest. I'm not sure it would be valid if > the > > > > > member crashed, hence session.timeout would come into play, we'd > > > probably > > > > > want to rebalance immediately. I'd be interested in hearing > thoughts > > > from > > > > > other core kafka folks on this one. > > > > > > > > > > Thanks, > > > > > Damian > > > > > > > > > > > > > > > > > > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin <becket....@gmail.com> > > wrote: > > > > > > > > > >> Hi Matthias, > > > > >> > > > > >> Yes, that was what I was thinking. We will keep delay it until > > either > > > > >> reaching the rebalance timeout or no new consumer joins in that > > small > > > > delay > > > > >> which is configured on the broker side. > > > > >> > > > > >> Thanks, > > > > >> > > > > >> Jiangjie (Becket) Qin > > > > >> > > > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax < > > > matth...@confluent.io > > > > > > > > > >> wrote: > > > > >> > > > > >>> @Becket: > > > > >>> > > > > >>> I am not sure, if I understand this correctly. Instead of > applying > > a > > > > >>> fixed delay, that starts when the first consumer of an (empty) > > group > > > > >>> joins, you suggest to re-trigger/re-set the delay each time a new > > > > >>> consumer joins? > > > > >>> > > > > >>> This sound like a good strategy to me, if the config is on the > > broker > > > > >> side. > > > > >>> > > > > >>> @Eno: > > > > >>> > > > > >>> I think that's a valid point and I like this idea! > > > > >>> > > > > >>> > > > > >>> -Matthias > > > > >>> > > > > >>> > > > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote: > > > > >>>> Thanks Damian, > > > > >>>> > > > > >>>> This KIP deals with the initial phase only. What about the cases > > > when > > > > >>> several consumers leave a group? Won't there be several expensive > > > > >>> rebalances then as well? I'm wondering if it makes sense for the > > > delay > > > > to > > > > >>> hold anytime the "set" of consumers in a group changes, be it > > > addition > > > > to > > > > >>> the group or removal from group. > > > > >>>> > > > > >>>> Thanks > > > > >>>> Eno > > > > >>>> > > > > >>>> > > > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin <becket....@gmail.com> > > wrote: > > > > >>>>> > > > > >>>>> Thanks for the KIP, Damian. > > > > >>>>> > > > > >>>>> My two cents on this. It seems there are two things worth > > thinking > > > > >> here: > > > > >>>>> > > > > >>>>> 1. Better rebalance timing. We will try to rebalance only when > > all > > > > the > > > > >>>>> consumers in a group have joined. The challenge would be > someone > > > has > > > > >> to > > > > >>>>> define what does ALL consumers mean, it could either be a time > or > > > > >>> number of > > > > >>>>> consumers, etc. > > > > >>>>> > > > > >>>>> 2. Avoid frequent rebalance. For example, if there are 100 > > > consumers > > > > >> in > > > > >>> a > > > > >>>>> group, today, in the worst case, we may end up with 100 > > rebalances > > > > >> even > > > > >>> if > > > > >>>>> all the consumers joined the group in a reasonably small amount > > of > > > > >> time. > > > > >>>>> Frequent rebalance is also a bad thing for brokers. > > > > >>>>> > > > > >>>>> Having a client side configuration may solve problem 1 better > > > because > > > > >>> each > > > > >>>>> consumer group can potentially configure their own timing. > > However, > > > > it > > > > >>> does > > > > >>>>> not really prevent frequent rebalance in general because some > of > > > the > > > > >>>>> consumers can be misconfigured. (This may have something to do > > with > > > > >>> KIP-124 > > > > >>>>> as well. But if quota is applied on the JoinGroup/SyncGroup > > request > > > > it > > > > >>> may > > > > >>>>> cause some unwanted cascading effects.) > > > > >>>>> > > > > >>>>> Having a broker side configuration may result in less > flexibility > > > for > > > > >>> each > > > > >>>>> consumer group, but it can prevent frequent rebalance better. I > > > think > > > > >>> with > > > > >>>>> some reasonable design, the rebalance timing issue can be > > resolved > > > on > > > > >>> the > > > > >>>>> broker side as well. Matthias had a good point on extending the > > > delay > > > > >>> when > > > > >>>>> a new consumer joins a group (we actually did something similar > > to > > > > >> batch > > > > >>>>> ISR change propagation). For example, let's say on the broker > > side, > > > > we > > > > >>> will > > > > >>>>> always delay 2 seconds each time we see a new consumer joining > a > > > > >>> consumer > > > > >>>>> group. This would probably work for most of the consumer groups > > and > > > > >> will > > > > >>>>> also limit the rebalance frequency to protect the brokers. > > > > >>>>> > > > > >>>>> I am not sure about the streams use case here, but if something > > > like > > > > 2 > > > > >>>>> seconds of delay is acceptable for streams, I would prefer > adding > > > the > > > > >>>>> configuration to the broker so that we can address both > problems. > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> > > > > >>>>> Jiangjie (Becket) Qin > > > > >>>>> > > > > >>>>> > > > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy < > > damian....@gmail.com> > > > > >>> wrote: > > > > >>>>> > > > > >>>>>> Thanks for the feedback. > > > > >>>>>> > > > > >>>>>> Ewen: I'm happy to make it a client side config. Other than > the > > > > >>> protocol > > > > >>>>>> bump i think the effort is almost the same. Personally i see > no > > > > other > > > > >>>>>> issues, but based on discussions with others this is what we > > came > > > up > > > > >>> with. > > > > >>>>>> > > > > >>>>>> True, it can probably be tested easily via an integration > test. > > > > >>>>>> > > > > >>>>>> Matthias: Yes i agree, the delay could be extended as each new > > > > member > > > > >>> joins > > > > >>>>>> the group. > > > > >>>>>> > > > > >>>>>> Thanks, > > > > >>>>>> Damian > > > > >>>>>> > > > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava < > > > > >> e...@confluent.io> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>> I have the same initial response as Ismael re: broker vs > > consumer > > > > >>>>>> settings. > > > > >>>>>>> The global setting seems questionable. > > > > >>>>>>> > > > > >>>>>>> Could we maybe summarize what the impact of making this a > > client > > > > >>> config > > > > >>>>>>> would be? Protocol bump is obvious, but is there any other > > > > >> significant > > > > >>>>>>> issue? For the protocol bump in particular, I think this > change > > > is > > > > >>>>>>> currently really critical for streams; it will be valuable > > > > >> elsewhere, > > > > >>> but > > > > >>>>>>> the immediate demand is streams, so a protocol bump while > being > > > > >>> backwards > > > > >>>>>>> compatible wouldn't affect any other clients. Is this still > > > > actually > > > > >>>>>>> compatible with different clients given that they would now > > > expect > > > > >>>>>>> different timeouts? (I think it's strictly compatible if you > > wait > > > > >> for > > > > >>>>>>> responses, but if you enforce any client side timeouts, I'm > not > > > so > > > > >>> sure.) > > > > >>>>>>> > > > > >>>>>>> re: test plan, I'm sure this will come as a surprise, but is > > the > > > > >>> system > > > > >>>>>>> test even necessary? Validating # of rebalances seems messy > as > > > > other > > > > >>>>>> things > > > > >>>>>>> can cause rebalances (though admittedly not in a "clean" > case). > > > But > > > > >>>>>> really > > > > >>>>>>> it seems like an integration test could validate this by > making > > > > sure > > > > >>>>>> only 1 > > > > >>>>>>> rebalance occurred when 2 members joined with a sufficient > time > > > > gap. > > > > >>>>>>> > > > > >>>>>>> -Ewen > > > > >>>>>>> > > > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax < > > > > >>> matth...@confluent.io> > > > > >>>>>>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Thanks for the KIP Damian! > > > > >>>>>>>> > > > > >>>>>>>> My two cents: > > > > >>>>>>>> > > > > >>>>>>>> - we should have an explicit parameter for this -- implicit > > > > setting > > > > >>>>>> are > > > > >>>>>>>> always tricky (the "importance" of this parameter would be > > LOW) > > > > >>>>>>>> > > > > >>>>>>>> - the config should be different for each consumer group: > > > > >>>>>>>> * assume you have a stateless app, you want to rebalance > > > > >>> immediately > > > > >>>>>>>> * if you start-up in an visualized environment using some > > > tools > > > > >>> like > > > > >>>>>>>> Mesos you might need a different value that on bare metal > (no > > VM > > > > to > > > > >>> be > > > > >>>>>>>> started) > > > > >>>>>>>> * it also depends, how many consumer instanced you expect > -- > > > > it's > > > > >>>>>>>> harder to start up 100 instances in 3 seconds than 5 > > > > >>>>>>>> > > > > >>>>>>>> - the default value should be zero > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> One more thought: what about scaling scenarios? If a > consumer > > > > group > > > > >>> has > > > > >>>>>>>> 10 instanced and should be scaled up to 20, it would make > > sense > > > to > > > > >> do > > > > >>>>>>>> this with a single rebalance, too. Thus, I am wondering, if > it > > > > >> would > > > > >>>>>>>> make sense to apply this delay each time a new consumer > joins > > > > >> group, > > > > >>>>>>>> even if the group is not empty? > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> -Matthias > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote: > > > > >>>>>>>>> Thanks Gouzhang - i think another problem with this is that > > is > > > > >>>>>>>> overloading > > > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm not sure > > that > > > is > > > > >> a > > > > >>>>>>> good > > > > >>>>>>>>> thing. > > > > >>>>>>>>> > > > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang < > > wangg...@gmail.com > > > > > > > > >>>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> The downside of it, though, is that although it "hides" > this > > > > from > > > > >>>>>> most > > > > >>>>>>>> of > > > > >>>>>>>>>> the users needing to be aware of it, by default session > > > timeout > > > > >>> i.e. > > > > >>>>>>> the > > > > >>>>>>>>>> rebalance timeout is 10 seconds which could arguably too > > long. > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> Guozhang > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang < > > > > >>> wangg...@gmail.com > > > > >>>>>>> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Just throwing another alternative idea here: we can > > consider > > > > >> using > > > > >>>>>>> the > > > > >>>>>>>>>>> rebalance timeout value which is already included in the > > join > > > > >>>>>> request > > > > >>>>>>>>>>> protocol (and on the current Java client it is always > > written > > > > as > > > > >>>>>> the > > > > >>>>>>>>>>> session timeout value), that the first member joining > will > > > > >> always > > > > >>>>>>> force > > > > >>>>>>>>>> the > > > > >>>>>>>>>>> coordinator to wait that long. By doing this we do not > need > > > to > > > > >>> bump > > > > >>>>>>> up > > > > >>>>>>>>>> the > > > > >>>>>>>>>>> protocol either. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Guozhang > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy < > > > > >> damian....@gmail.com > > > > >>>> > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> Hi Ismael, > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Mostly to avoid the protocol bump. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> I agree that it may be difficult to choose the right > delay > > > for > > > > >>> all > > > > >>>>>>>>>>>> consumer > > > > >>>>>>>>>>>> groups, but we wanted to make this something that most > > users > > > > >>> don't > > > > >>>>>>>>>> really > > > > >>>>>>>>>>>> need to think about, i.e., a small enough default delay > > that > > > > >>> works > > > > >>>>>>> in > > > > >>>>>>>>>> the > > > > >>>>>>>>>>>> majority of cases. However it would be much more > flexible > > > as a > > > > >>>>>>>> consumer > > > > >>>>>>>>>>>> config, which i'm happy to pursue if this change is > worthy > > > of > > > > a > > > > >>>>>>>> protocol > > > > >>>>>>>>>>>> bump. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>> Damian > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma < > > ism...@juma.me.uk > > > > > > > > >>>>>> wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to avoid > > > multiple > > > > >>>>>>>>>> rebalances > > > > >>>>>>>>>>>>> during start-up. One issue with having this as a broker > > > > config > > > > >>> is > > > > >>>>>>>> that > > > > >>>>>>>>>>>> it > > > > >>>>>>>>>>>>> may be difficult to choose the right delay for all > > consumer > > > > >>>>>> groups. > > > > >>>>>>>>>> Can > > > > >>>>>>>>>>>> you > > > > >>>>>>>>>>>>> elaborate a little more on why the first alternative > > (add a > > > > >>>>>>> consumer > > > > >>>>>>>>>>>>> config) was rejected? We bump protocol versions > regularly > > > > >> (when > > > > >>>>>> it > > > > >>>>>>>>>> makes > > > > >>>>>>>>>>>>> sense), so it would be good to get a bit more detail. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>> Ismael > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy < > > > > >>>>>> damian....@gmail.com > > > > >>>>>>>> > > > > >>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Hi All, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable delay to the > > > > >> initial > > > > >>>>>>>>>>>> consumer > > > > >>>>>>>>>>>>>> group rebalance. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Please have look here: > > > > >>>>>>>>>>>>>> https://cwiki.apache.org/confl > uence/display/KAFKA/KIP- > > > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>> Damian > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems the > first > > > one > > > > >> may > > > > >>>>>>> have > > > > >>>>>>>>>>>> not > > > > >>>>>>>>>>>>>> made it. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> -- > > > > >>>>>>>>>>> -- Guozhang > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> -- > > > > >>>>>>>>>> -- Guozhang > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>> > > > > >>> > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > > > -- -- Guozhang