Re: [DISCUSS] KIP-134: Delay initial consumer group rebalance

Damian Guy Thu, 30 Mar 2017 00:27:49 -0700

Hi Guozhang,

Yes the clock will be reset not extended. Sorry incorrect wording in the
KIP. I'll update it.


Thanks,
Damian

On Wed, 29 Mar 2017 at 23:18 Guozhang Wang <wangg...@gmail.com> wrote:

> Made another pass over the KIP wiki, overall LGTM. One quick question on
> the described logic: "they will be added to the group and the delay will be
> extended by min(remainingRebalanceTimeout,
> group.initial.rebalance.delay.ms)"
> though:
>
> From your previous email I thought you are "resetting the clock" when a new
> consumer join group request is received, but it seems to be different. So
> suppose the rebalance timeout is very large so it won't be hit generally
> (default is 5 min), and delay is set to 3 secs, if the group has 10 members
> and we received all their join group request at roughly the same time, or
> say they arrived within 1 sec, then "resetting clock" will cause the whole
> delay to be no more than 1 + 3 = 4 secs; while extending it will cause it
> to be 1 + 3 * 10 = 31 secs?
>
>
>
> Guozhang
>
>
> On Wed, Mar 29, 2017 at 3:04 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Thanks Damian!
> >
> > On Wed, Mar 29, 2017 at 1:27 AM, Damian Guy <damian....@gmail.com>
> wrote:
> >
> >> Thanks everyone for the discussion, very helpful. I've updated the KIP
> to
> >> make the delay such that it is extended as new members join the group
> and
> >> that it never exceeds the groups rebalance timeout.
> >>
> >> If everyone is ok with this I'll kick off the voting thread.
> >>
> >> Thanks again,
> >> Damian
> >>
> >> On Tue, 28 Mar 2017 at 23:18 Becket Qin <becket....@gmail.com> wrote:
> >>
> >> > I think separating leave/join makes sense. The scenario I can think of
> >> for
> >> > delaying a rebalance on LeaveGroupRequest is rolling bounce of a
> >> service.
> >> > But that scenario could be tricky because there may be mixture of
> >> joining
> >> > and leaving. What happens if a consumer left the group right after
> >> another
> >> > consumer joins the group? Which delay should be applied?
> >> >
> >> > Jason, if I understand correctly, the actual delay of the FIRST
> >> rebalance
> >> > for each group could be anywhere between
> group.initial.rebalance.delay.
> >> ms
> >> > and
> >> > the rebalance timeout, depending on how many times the delay is
> applied.
> >> > For example, if the delay is set to 3 seconds and rebalance timeout is
> >> set
> >> > to 10 seconds. At time T a consumer joins the group, the targeting
> >> > rebalance point would be T+3 if no other consumer joins. If another
> >> > consumer joins the group at T+2 then the targeting delay point would
> >> become
> >> > T+5, etc. However, no matter how many times the delay was extended, at
> >> T+10
> >> > the rebalance will kick off even if at T+9 a new consumer joined the
> >> group.
> >> >
> >> > I also agree that we should set the default delay to some meaningful
> >> value
> >> > instead of setting it to 0.
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Tue, Mar 28, 2017 at 12:32 PM, Jason Gustafson <ja...@confluent.io
> >
> >> > wrote:
> >> >
> >> > > Hey Damian,
> >> > >
> >> > > Thanks for the KIP. I think the proposal makes sense as a workaround
> >> > maybe
> >> > > for some advanced users. However, I'm not sure we can depend on
> >> average
> >> > > users knowing that the config exists, much less setting it to
> >> something
> >> > > that makes sense. It's kind of a trend in streams that I'm not too
> >> > thrilled
> >> > > about to try and control these rebalances through careful tuning of
> >> > various
> >> > > timeouts. For example, the patch to avoid sending LeaveGroup depends
> >> on
> >> > the
> >> > > session timeout being set at least as long as the time for an
> average
> >> > > rolling restart. If the expectation is that these settings are only
> >> > needed
> >> > > for advanced users, it may be sufficient, but if the problems are
> >> > affecting
> >> > > average users, it seems less than ideal. That said, if we can get
> some
> >> > real
> >> > > benefit from low-hanging fruit like this, then it's probably
> >> worthwhile.
> >> > >
> >> > > This relates to the choice of default value, by the way. If we use 0
> >> as
> >> > the
> >> > > default, my guess is that most users won't change it and the benefit
> >> > could
> >> > > be marginal. The choice of 3 seconds that you've documented seems
> >> fine to
> >> > > me. It matches the default consumer heartbeat interval, which
> controls
> >> > > typical rebalance latency, so there's some consistency there.
> >> > >
> >> > > Also, one minor comment: I guess the actual delay for each group
> will
> >> be
> >> > > the minimum of the group's rebalance timeout and
> >> > > group.initial.rebalance.delay.ms. Is that right?
> >> > >
> >> > > -Jason
> >> > >
> >> > > On Tue, Mar 28, 2017 at 8:29 AM, Damian Guy <damian....@gmail.com>
> >> > wrote:
> >> > >
> >> > > > @Ismael - yeah sure we can reduce the default, though i'm not sure
> >> what
> >> > > the
> >> > > > "right" default would be.
> >> > > >
> >> > > > On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk>
> wrote:
> >> > > >
> >> > > > > Is 3 seconds the right default if the timer gets reset after
> each
> >> > > > consumer
> >> > > > > joins? Maybe we can lower the default value given the new
> >> approach.
> >> > > > >
> >> > > > > Ismael
> >> > > > >
> >> > > > > On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy <
> damian....@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > All,
> >> > > > > > I'd like to get this back to the original discussion about
> >> Delaying
> >> > > > > initial
> >> > > > > > consumer group rebalance.
> >> > > > > > I think i'm leaning towards sticking with the broker config
> and
> >> > > > changing
> >> > > > > > the delay so that the timer starts again when a new consumer
> >> joins
> >> > > the
> >> > > > > > group. What are peoples thoughts on that?
> >> > > > > >
> >> > > > > > Doing something similar on leave is valid, but i'd prefer to
> >> > consider
> >> > > > it
> >> > > > > > separately from this.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Damian
> >> > > > > >
> >> > > > > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com
> >
> >> > > wrote:
> >> > > > > >
> >> > > > > > > Matthias,
> >> > > > > > >
> >> > > > > > > Yes i know.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Damian
> >> > > > > > >
> >> > > > > > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <
> >> > > matth...@confluent.io>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > Damian,
> >> > > > > > >
> >> > > > > > > about "rebalance immediately" on timeout -- I guess, that's
> a
> >> > > > different
> >> > > > > > > case as no LeaveGroupRequest will be sent. Thus, the broker
> >> > should
> >> > > be
> >> > > > > > > able to distinguish both cases easily, and apply the delay
> >> only
> >> > if
> >> > > it
> >> > > > > > > received the LeaveGroupRequest but not if a consumer times
> >> out.
> >> > > > > > >
> >> > > > > > > Does this make sense?
> >> > > > > > >
> >> > > > > > > -Matthias
> >> > > > > > >
> >> > > > > > > On 3/27/17 1:56 AM, Damian Guy wrote:
> >> > > > > > > > @Becket
> >> > > > > > > >
> >> > > > > > > > Thanks for the feedback. Yes, i like the idea of extending
> >> the
> >> > > > delay
> >> > > > > as
> >> > > > > > > > each new consumer joins the group. Though, i think this
> >> could
> >> > be
> >> > > > done
> >> > > > > > > with
> >> > > > > > > > either a consumer or broker side config. But i get your
> >> point
> >> > > that
> >> > > > > some
> >> > > > > > > > consumers in the group can be misconfigured.
> >> > > > > > > >
> >> > > > > > > > @Matthias & @Eno - yes we could probably do something
> >> similar
> >> > if
> >> > > > the
> >> > > > > > > member
> >> > > > > > > > has sent the LeaveGroupRequest. I'm not sure it would be
> >> valid
> >> > if
> >> > > > the
> >> > > > > > > > member crashed, hence session.timeout would come into
> play,
> >> > we'd
> >> > > > > > probably
> >> > > > > > > > want to rebalance immediately. I'd be interested in
> hearing
> >> > > > thoughts
> >> > > > > > from
> >> > > > > > > > other core kafka folks on this one.
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Damian
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin <
> >> becket....@gmail.com>
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > >> Hi Matthias,
> >> > > > > > > >>
> >> > > > > > > >> Yes, that was what I was thinking. We will keep delay it
> >> until
> >> > > > > either
> >> > > > > > > >> reaching the rebalance timeout or no new consumer joins
> in
> >> > that
> >> > > > > small
> >> > > > > > > delay
> >> > > > > > > >> which is configured on the broker side.
> >> > > > > > > >>
> >> > > > > > > >> Thanks,
> >> > > > > > > >>
> >> > > > > > > >> Jiangjie (Becket) Qin
> >> > > > > > > >>
> >> > > > > > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax <
> >> > > > > > matth...@confluent.io
> >> > > > > > > >
> >> > > > > > > >> wrote:
> >> > > > > > > >>
> >> > > > > > > >>> @Becket:
> >> > > > > > > >>>
> >> > > > > > > >>> I am not sure, if I understand this correctly. Instead
> of
> >> > > > applying
> >> > > > > a
> >> > > > > > > >>> fixed delay, that starts when the first consumer of an
> >> > (empty)
> >> > > > > group
> >> > > > > > > >>> joins, you suggest to re-trigger/re-set the delay each
> >> time a
> >> > > new
> >> > > > > > > >>> consumer joins?
> >> > > > > > > >>>
> >> > > > > > > >>> This sound like a good strategy to me, if the config is
> on
> >> > the
> >> > > > > broker
> >> > > > > > > >> side.
> >> > > > > > > >>>
> >> > > > > > > >>> @Eno:
> >> > > > > > > >>>
> >> > > > > > > >>> I think that's a valid point and I like this idea!
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> -Matthias
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote:
> >> > > > > > > >>>> Thanks Damian,
> >> > > > > > > >>>>
> >> > > > > > > >>>> This KIP deals with the initial phase only. What about
> >> the
> >> > > cases
> >> > > > > > when
> >> > > > > > > >>> several consumers leave a group? Won't there be several
> >> > > expensive
> >> > > > > > > >>> rebalances then as well? I'm wondering if it makes sense
> >> for
> >> > > the
> >> > > > > > delay
> >> > > > > > > to
> >> > > > > > > >>> hold anytime the "set" of consumers in a group changes,
> >> be it
> >> > > > > > addition
> >> > > > > > > to
> >> > > > > > > >>> the group or removal from group.
> >> > > > > > > >>>>
> >> > > > > > > >>>> Thanks
> >> > > > > > > >>>> Eno
> >> > > > > > > >>>>
> >> > > > > > > >>>>
> >> > > > > > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin <
> >> becket....@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Thanks for the KIP, Damian.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> My two cents on this. It seems there are two things
> >> worth
> >> > > > > thinking
> >> > > > > > > >> here:
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> 1. Better rebalance timing. We will try to rebalance
> >> only
> >> > > when
> >> > > > > all
> >> > > > > > > the
> >> > > > > > > >>>>> consumers in a group have joined. The challenge would
> be
> >> > > > someone
> >> > > > > > has
> >> > > > > > > >> to
> >> > > > > > > >>>>> define what does ALL consumers mean, it could either
> be
> >> a
> >> > > time
> >> > > > or
> >> > > > > > > >>> number of
> >> > > > > > > >>>>> consumers, etc.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> 2. Avoid frequent rebalance. For example, if there are
> >> 100
> >> > > > > > consumers
> >> > > > > > > >> in
> >> > > > > > > >>> a
> >> > > > > > > >>>>> group, today, in the worst case, we may end up with
> 100
> >> > > > > rebalances
> >> > > > > > > >> even
> >> > > > > > > >>> if
> >> > > > > > > >>>>> all the consumers joined the group in a reasonably
> small
> >> > > amount
> >> > > > > of
> >> > > > > > > >> time.
> >> > > > > > > >>>>> Frequent rebalance is also a bad thing for brokers.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Having a client side configuration may solve problem 1
> >> > better
> >> > > > > > because
> >> > > > > > > >>> each
> >> > > > > > > >>>>> consumer group can potentially configure their own
> >> timing.
> >> > > > > However,
> >> > > > > > > it
> >> > > > > > > >>> does
> >> > > > > > > >>>>> not really prevent frequent rebalance in general
> because
> >> > some
> >> > > > of
> >> > > > > > the
> >> > > > > > > >>>>> consumers can be misconfigured. (This may have
> >> something to
> >> > > do
> >> > > > > with
> >> > > > > > > >>> KIP-124
> >> > > > > > > >>>>> as well. But if quota is applied on the
> >> JoinGroup/SyncGroup
> >> > > > > request
> >> > > > > > > it
> >> > > > > > > >>> may
> >> > > > > > > >>>>> cause some unwanted cascading effects.)
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Having a broker side configuration may result in less
> >> > > > flexibility
> >> > > > > > for
> >> > > > > > > >>> each
> >> > > > > > > >>>>> consumer group, but it can prevent frequent rebalance
> >> > > better. I
> >> > > > > > think
> >> > > > > > > >>> with
> >> > > > > > > >>>>> some reasonable design, the rebalance timing issue can
> >> be
> >> > > > > resolved
> >> > > > > > on
> >> > > > > > > >>> the
> >> > > > > > > >>>>> broker side as well. Matthias had a good point on
> >> extending
> >> > > the
> >> > > > > > delay
> >> > > > > > > >>> when
> >> > > > > > > >>>>> a new consumer joins a group (we actually did
> something
> >> > > similar
> >> > > > > to
> >> > > > > > > >> batch
> >> > > > > > > >>>>> ISR change propagation). For example, let's say on the
> >> > broker
> >> > > > > side,
> >> > > > > > > we
> >> > > > > > > >>> will
> >> > > > > > > >>>>> always delay 2 seconds each time we see a new consumer
> >> > > joining
> >> > > > a
> >> > > > > > > >>> consumer
> >> > > > > > > >>>>> group. This would probably work for most of the
> consumer
> >> > > groups
> >> > > > > and
> >> > > > > > > >> will
> >> > > > > > > >>>>> also limit the rebalance frequency to protect the
> >> brokers.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> I am not sure about the streams use case here, but if
> >> > > something
> >> > > > > > like
> >> > > > > > > 2
> >> > > > > > > >>>>> seconds of delay is acceptable for streams, I would
> >> prefer
> >> > > > adding
> >> > > > > > the
> >> > > > > > > >>>>> configuration to the broker so that we can address
> both
> >> > > > problems.
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Thanks,
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> Jiangjie (Becket) Qin
> >> > > > > > > >>>>>
> >> > > > > > > >>>>>
> >> > > > > > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <
> >> > > > > damian....@gmail.com>
> >> > > > > > > >>> wrote:
> >> > > > > > > >>>>>
> >> > > > > > > >>>>>> Thanks for the feedback.
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>> Ewen: I'm happy to make it a client side config.
> Other
> >> > than
> >> > > > the
> >> > > > > > > >>> protocol
> >> > > > > > > >>>>>> bump i think the effort is almost the same.
> Personally
> >> i
> >> > see
> >> > > > no
> >> > > > > > > other
> >> > > > > > > >>>>>> issues, but based on discussions with others this is
> >> what
> >> > we
> >> > > > > came
> >> > > > > > up
> >> > > > > > > >>> with.
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>> True, it can probably be tested easily via an
> >> integration
> >> > > > test.
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>> Matthias: Yes i agree, the delay could be extended as
> >> each
> >> > > new
> >> > > > > > > member
> >> > > > > > > >>> joins
> >> > > > > > > >>>>>> the group.
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>> Thanks,
> >> > > > > > > >>>>>> Damian
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <
> >> > > > > > > >> e...@confluent.io>
> >> > > > > > > >>>>>> wrote:
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>>>> I have the same initial response as Ismael re:
> broker
> >> vs
> >> > > > > consumer
> >> > > > > > > >>>>>> settings.
> >> > > > > > > >>>>>>> The global setting seems questionable.
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>> Could we maybe summarize what the impact of making
> >> this a
> >> > > > > client
> >> > > > > > > >>> config
> >> > > > > > > >>>>>>> would be? Protocol bump is obvious, but is there any
> >> > other
> >> > > > > > > >> significant
> >> > > > > > > >>>>>>> issue? For the protocol bump in particular, I think
> >> this
> >> > > > change
> >> > > > > > is
> >> > > > > > > >>>>>>> currently really critical for streams; it will be
> >> > valuable
> >> > > > > > > >> elsewhere,
> >> > > > > > > >>> but
> >> > > > > > > >>>>>>> the immediate demand is streams, so a protocol bump
> >> while
> >> > > > being
> >> > > > > > > >>> backwards
> >> > > > > > > >>>>>>> compatible wouldn't affect any other clients. Is
> this
> >> > still
> >> > > > > > > actually
> >> > > > > > > >>>>>>> compatible with different clients given that they
> >> would
> >> > now
> >> > > > > > expect
> >> > > > > > > >>>>>>> different timeouts? (I think it's strictly
> compatible
> >> if
> >> > > you
> >> > > > > wait
> >> > > > > > > >> for
> >> > > > > > > >>>>>>> responses, but if you enforce any client side
> >> timeouts,
> >> > I'm
> >> > > > not
> >> > > > > > so
> >> > > > > > > >>> sure.)
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>> re: test plan, I'm sure this will come as a
> surprise,
> >> but
> >> > > is
> >> > > > > the
> >> > > > > > > >>> system
> >> > > > > > > >>>>>>> test even necessary? Validating # of rebalances
> seems
> >> > messy
> >> > > > as
> >> > > > > > > other
> >> > > > > > > >>>>>> things
> >> > > > > > > >>>>>>> can cause rebalances (though admittedly not in a
> >> "clean"
> >> > > > case).
> >> > > > > > But
> >> > > > > > > >>>>>> really
> >> > > > > > > >>>>>>> it seems like an integration test could validate
> this
> >> by
> >> > > > making
> >> > > > > > > sure
> >> > > > > > > >>>>>> only 1
> >> > > > > > > >>>>>>> rebalance occurred when 2 members joined with a
> >> > sufficient
> >> > > > time
> >> > > > > > > gap.
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>> -Ewen
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <
> >> > > > > > > >>> matth...@confluent.io>
> >> > > > > > > >>>>>>> wrote:
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>>> Thanks for the KIP Damian!
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> My two cents:
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> - we should have an explicit parameter for this --
> >> > > implicit
> >> > > > > > > setting
> >> > > > > > > >>>>>> are
> >> > > > > > > >>>>>>>> always tricky (the "importance" of this parameter
> >> would
> >> > be
> >> > > > > LOW)
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> - the config should be different for each consumer
> >> > group:
> >> > > > > > > >>>>>>>>   * assume you have a stateless app, you want to
> >> > rebalance
> >> > > > > > > >>> immediately
> >> > > > > > > >>>>>>>>   * if you start-up in an visualized environment
> >> using
> >> > > some
> >> > > > > > tools
> >> > > > > > > >>> like
> >> > > > > > > >>>>>>>> Mesos you might need a different value that on bare
> >> > metal
> >> > > > (no
> >> > > > > VM
> >> > > > > > > to
> >> > > > > > > >>> be
> >> > > > > > > >>>>>>>> started)
> >> > > > > > > >>>>>>>>   * it also depends, how many consumer instanced
> you
> >> > > expect
> >> > > > --
> >> > > > > > > it's
> >> > > > > > > >>>>>>>> harder to start up 100 instances in 3 seconds than
> 5
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> - the default value should be zero
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> One more thought: what about scaling scenarios? If
> a
> >> > > > consumer
> >> > > > > > > group
> >> > > > > > > >>> has
> >> > > > > > > >>>>>>>> 10 instanced and should be scaled up to 20, it
> would
> >> > make
> >> > > > > sense
> >> > > > > > to
> >> > > > > > > >> do
> >> > > > > > > >>>>>>>> this with a single rebalance, too. Thus, I am
> >> wondering,
> >> > > if
> >> > > > it
> >> > > > > > > >> would
> >> > > > > > > >>>>>>>> make sense to apply this delay each time a new
> >> consumer
> >> > > > joins
> >> > > > > > > >> group,
> >> > > > > > > >>>>>>>> even if the group is not empty?
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> -Matthias
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote:
> >> > > > > > > >>>>>>>>> Thanks Gouzhang - i think another problem with
> this
> >> is
> >> > > that
> >> > > > > is
> >> > > > > > > >>>>>>>> overloading
> >> > > > > > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm
> not
> >> > sure
> >> > > > > that
> >> > > > > > is
> >> > > > > > > >> a
> >> > > > > > > >>>>>>> good
> >> > > > > > > >>>>>>>>> thing.
> >> > > > > > > >>>>>>>>>
> >> > > > > > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <
> >> > > > > wangg...@gmail.com
> >> > > > > > >
> >> > > > > > > >>>>>> wrote:
> >> > > > > > > >>>>>>>>>
> >> > > > > > > >>>>>>>>>> The downside of it, though, is that although it
> >> > "hides"
> >> > > > this
> >> > > > > > > from
> >> > > > > > > >>>>>> most
> >> > > > > > > >>>>>>>> of
> >> > > > > > > >>>>>>>>>> the users needing to be aware of it, by default
> >> > session
> >> > > > > > timeout
> >> > > > > > > >>> i.e.
> >> > > > > > > >>>>>>> the
> >> > > > > > > >>>>>>>>>> rebalance timeout is 10 seconds which could
> >> arguably
> >> > too
> >> > > > > long.
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>> Guozhang
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <
> >> > > > > > > >>> wangg...@gmail.com
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>>>>> wrote:
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>>> Just throwing another alternative idea here: we
> >> can
> >> > > > > consider
> >> > > > > > > >> using
> >> > > > > > > >>>>>>> the
> >> > > > > > > >>>>>>>>>>> rebalance timeout value which is already
> included
> >> in
> >> > > the
> >> > > > > join
> >> > > > > > > >>>>>> request
> >> > > > > > > >>>>>>>>>>> protocol (and on the current Java client it is
> >> always
> >> > > > > written
> >> > > > > > > as
> >> > > > > > > >>>>>> the
> >> > > > > > > >>>>>>>>>>> session timeout value), that the first member
> >> joining
> >> > > > will
> >> > > > > > > >> always
> >> > > > > > > >>>>>>> force
> >> > > > > > > >>>>>>>>>> the
> >> > > > > > > >>>>>>>>>>> coordinator to wait that long. By doing this we
> do
> >> > not
> >> > > > need
> >> > > > > > to
> >> > > > > > > >>> bump
> >> > > > > > > >>>>>>> up
> >> > > > > > > >>>>>>>>>> the
> >> > > > > > > >>>>>>>>>>> protocol either.
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>> Guozhang
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <
> >> > > > > > > >> damian....@gmail.com
> >> > > > > > > >>>>
> >> > > > > > > >>>>>>>>>> wrote:
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>> Hi Ismael,
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>> Mostly to avoid the protocol bump.
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>> I agree that it may be difficult to choose the
> >> right
> >> > > > delay
> >> > > > > > for
> >> > > > > > > >>> all
> >> > > > > > > >>>>>>>>>>>> consumer
> >> > > > > > > >>>>>>>>>>>> groups, but we wanted to make this something
> that
> >> > most
> >> > > > > users
> >> > > > > > > >>> don't
> >> > > > > > > >>>>>>>>>> really
> >> > > > > > > >>>>>>>>>>>> need to think about, i.e., a small enough
> default
> >> > > delay
> >> > > > > that
> >> > > > > > > >>> works
> >> > > > > > > >>>>>>> in
> >> > > > > > > >>>>>>>>>> the
> >> > > > > > > >>>>>>>>>>>> majority of cases. However it would be much
> more
> >> > > > flexible
> >> > > > > > as a
> >> > > > > > > >>>>>>>> consumer
> >> > > > > > > >>>>>>>>>>>> config, which i'm happy to pursue if this
> change
> >> is
> >> > > > worthy
> >> > > > > > of
> >> > > > > > > a
> >> > > > > > > >>>>>>>> protocol
> >> > > > > > > >>>>>>>>>>>> bump.
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>> Thanks,
> >> > > > > > > >>>>>>>>>>>> Damian
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <
> >> > > > > ism...@juma.me.uk
> >> > > > > > >
> >> > > > > > > >>>>>> wrote:
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to
> >> avoid
> >> > > > > > multiple
> >> > > > > > > >>>>>>>>>> rebalances
> >> > > > > > > >>>>>>>>>>>>> during start-up. One issue with having this
> as a
> >> > > broker
> >> > > > > > > config
> >> > > > > > > >>> is
> >> > > > > > > >>>>>>>> that
> >> > > > > > > >>>>>>>>>>>> it
> >> > > > > > > >>>>>>>>>>>>> may be difficult to choose the right delay for
> >> all
> >> > > > > consumer
> >> > > > > > > >>>>>> groups.
> >> > > > > > > >>>>>>>>>> Can
> >> > > > > > > >>>>>>>>>>>> you
> >> > > > > > > >>>>>>>>>>>>> elaborate a little more on why the first
> >> > alternative
> >> > > > > (add a
> >> > > > > > > >>>>>>> consumer
> >> > > > > > > >>>>>>>>>>>>> config) was rejected? We bump protocol
> versions
> >> > > > regularly
> >> > > > > > > >> (when
> >> > > > > > > >>>>>> it
> >> > > > > > > >>>>>>>>>> makes
> >> > > > > > > >>>>>>>>>>>>> sense), so it would be good to get a bit more
> >> > detail.
> >> > > > > > > >>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>> Thanks,
> >> > > > > > > >>>>>>>>>>>>> Ismael
> >> > > > > > > >>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <
> >> > > > > > > >>>>>> damian....@gmail.com
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>>>>>> wrote:
> >> > > > > > > >>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>> Hi All,
> >> > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable
> >> delay to
> >> > > the
> >> > > > > > > >> initial
> >> > > > > > > >>>>>>>>>>>> consumer
> >> > > > > > > >>>>>>>>>>>>>> group rebalance.
> >> > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>> Please have look here:
> >> > > > > > > >>>>>>>>>>>>>> https://cwiki.apache.org/
> >> > > > confluence/display/KAFKA/KIP-
> >> > > > > > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance
> >> > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>> Thanks,
> >> > > > > > > >>>>>>>>>>>>>> Damian
> >> > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems
> >> the
> >> > > > first
> >> > > > > > one
> >> > > > > > > >> may
> >> > > > > > > >>>>>>> have
> >> > > > > > > >>>>>>>>>>>> not
> >> > > > > > > >>>>>>>>>>>>>> made it.
> >> > > > > > > >>>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>> --
> >> > > > > > > >>>>>>>>>>> -- Guozhang
> >> > > > > > > >>>>>>>>>>>
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>> --
> >> > > > > > > >>>>>>>>>> -- Guozhang
> >> > > > > > > >>>>>>>>>>
> >> > > > > > > >>>>>>>>>
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>>
> >> > > > > > > >>>>>>>
> >> > > > > > > >>>>>>
> >> > > > > > > >>>>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-134: Delay initial consumer group rebalance

Reply via email to