Made another pass over the KIP wiki, overall LGTM. One quick question on
the described logic: "they will be added to the group and the delay will be
extended by min(remainingRebalanceTimeout, group.initial.rebalance.delay.ms)"
though:

>From your previous email I thought you are "resetting the clock" when a new
consumer join group request is received, but it seems to be different. So
suppose the rebalance timeout is very large so it won't be hit generally
(default is 5 min), and delay is set to 3 secs, if the group has 10 members
and we received all their join group request at roughly the same time, or
say they arrived within 1 sec, then "resetting clock" will cause the whole
delay to be no more than 1 + 3 = 4 secs; while extending it will cause it
to be 1 + 3 * 10 = 31 secs?



Guozhang


On Wed, Mar 29, 2017 at 3:04 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks Damian!
>
> On Wed, Mar 29, 2017 at 1:27 AM, Damian Guy <damian....@gmail.com> wrote:
>
>> Thanks everyone for the discussion, very helpful. I've updated the KIP to
>> make the delay such that it is extended as new members join the group and
>> that it never exceeds the groups rebalance timeout.
>>
>> If everyone is ok with this I'll kick off the voting thread.
>>
>> Thanks again,
>> Damian
>>
>> On Tue, 28 Mar 2017 at 23:18 Becket Qin <becket....@gmail.com> wrote:
>>
>> > I think separating leave/join makes sense. The scenario I can think of
>> for
>> > delaying a rebalance on LeaveGroupRequest is rolling bounce of a
>> service.
>> > But that scenario could be tricky because there may be mixture of
>> joining
>> > and leaving. What happens if a consumer left the group right after
>> another
>> > consumer joins the group? Which delay should be applied?
>> >
>> > Jason, if I understand correctly, the actual delay of the FIRST
>> rebalance
>> > for each group could be anywhere between group.initial.rebalance.delay.
>> ms
>> > and
>> > the rebalance timeout, depending on how many times the delay is applied.
>> > For example, if the delay is set to 3 seconds and rebalance timeout is
>> set
>> > to 10 seconds. At time T a consumer joins the group, the targeting
>> > rebalance point would be T+3 if no other consumer joins. If another
>> > consumer joins the group at T+2 then the targeting delay point would
>> become
>> > T+5, etc. However, no matter how many times the delay was extended, at
>> T+10
>> > the rebalance will kick off even if at T+9 a new consumer joined the
>> group.
>> >
>> > I also agree that we should set the default delay to some meaningful
>> value
>> > instead of setting it to 0.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Tue, Mar 28, 2017 at 12:32 PM, Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> >
>> > > Hey Damian,
>> > >
>> > > Thanks for the KIP. I think the proposal makes sense as a workaround
>> > maybe
>> > > for some advanced users. However, I'm not sure we can depend on
>> average
>> > > users knowing that the config exists, much less setting it to
>> something
>> > > that makes sense. It's kind of a trend in streams that I'm not too
>> > thrilled
>> > > about to try and control these rebalances through careful tuning of
>> > various
>> > > timeouts. For example, the patch to avoid sending LeaveGroup depends
>> on
>> > the
>> > > session timeout being set at least as long as the time for an average
>> > > rolling restart. If the expectation is that these settings are only
>> > needed
>> > > for advanced users, it may be sufficient, but if the problems are
>> > affecting
>> > > average users, it seems less than ideal. That said, if we can get some
>> > real
>> > > benefit from low-hanging fruit like this, then it's probably
>> worthwhile.
>> > >
>> > > This relates to the choice of default value, by the way. If we use 0
>> as
>> > the
>> > > default, my guess is that most users won't change it and the benefit
>> > could
>> > > be marginal. The choice of 3 seconds that you've documented seems
>> fine to
>> > > me. It matches the default consumer heartbeat interval, which controls
>> > > typical rebalance latency, so there's some consistency there.
>> > >
>> > > Also, one minor comment: I guess the actual delay for each group will
>> be
>> > > the minimum of the group's rebalance timeout and
>> > > group.initial.rebalance.delay.ms. Is that right?
>> > >
>> > > -Jason
>> > >
>> > > On Tue, Mar 28, 2017 at 8:29 AM, Damian Guy <damian....@gmail.com>
>> > wrote:
>> > >
>> > > > @Ismael - yeah sure we can reduce the default, though i'm not sure
>> what
>> > > the
>> > > > "right" default would be.
>> > > >
>> > > > On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk> wrote:
>> > > >
>> > > > > Is 3 seconds the right default if the timer gets reset after each
>> > > > consumer
>> > > > > joins? Maybe we can lower the default value given the new
>> approach.
>> > > > >
>> > > > > Ismael
>> > > > >
>> > > > > On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy <damian....@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > All,
>> > > > > > I'd like to get this back to the original discussion about
>> Delaying
>> > > > > initial
>> > > > > > consumer group rebalance.
>> > > > > > I think i'm leaning towards sticking with the broker config and
>> > > > changing
>> > > > > > the delay so that the timer starts again when a new consumer
>> joins
>> > > the
>> > > > > > group. What are peoples thoughts on that?
>> > > > > >
>> > > > > > Doing something similar on leave is valid, but i'd prefer to
>> > consider
>> > > > it
>> > > > > > separately from this.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Damian
>> > > > > >
>> > > > > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > Matthias,
>> > > > > > >
>> > > > > > > Yes i know.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Damian
>> > > > > > >
>> > > > > > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <
>> > > matth...@confluent.io>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > Damian,
>> > > > > > >
>> > > > > > > about "rebalance immediately" on timeout -- I guess, that's a
>> > > > different
>> > > > > > > case as no LeaveGroupRequest will be sent. Thus, the broker
>> > should
>> > > be
>> > > > > > > able to distinguish both cases easily, and apply the delay
>> only
>> > if
>> > > it
>> > > > > > > received the LeaveGroupRequest but not if a consumer times
>> out.
>> > > > > > >
>> > > > > > > Does this make sense?
>> > > > > > >
>> > > > > > > -Matthias
>> > > > > > >
>> > > > > > > On 3/27/17 1:56 AM, Damian Guy wrote:
>> > > > > > > > @Becket
>> > > > > > > >
>> > > > > > > > Thanks for the feedback. Yes, i like the idea of extending
>> the
>> > > > delay
>> > > > > as
>> > > > > > > > each new consumer joins the group. Though, i think this
>> could
>> > be
>> > > > done
>> > > > > > > with
>> > > > > > > > either a consumer or broker side config. But i get your
>> point
>> > > that
>> > > > > some
>> > > > > > > > consumers in the group can be misconfigured.
>> > > > > > > >
>> > > > > > > > @Matthias & @Eno - yes we could probably do something
>> similar
>> > if
>> > > > the
>> > > > > > > member
>> > > > > > > > has sent the LeaveGroupRequest. I'm not sure it would be
>> valid
>> > if
>> > > > the
>> > > > > > > > member crashed, hence session.timeout would come into play,
>> > we'd
>> > > > > > probably
>> > > > > > > > want to rebalance immediately. I'd be interested in hearing
>> > > > thoughts
>> > > > > > from
>> > > > > > > > other core kafka folks on this one.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Damian
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin <
>> becket....@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Hi Matthias,
>> > > > > > > >>
>> > > > > > > >> Yes, that was what I was thinking. We will keep delay it
>> until
>> > > > > either
>> > > > > > > >> reaching the rebalance timeout or no new consumer joins in
>> > that
>> > > > > small
>> > > > > > > delay
>> > > > > > > >> which is configured on the broker side.
>> > > > > > > >>
>> > > > > > > >> Thanks,
>> > > > > > > >>
>> > > > > > > >> Jiangjie (Becket) Qin
>> > > > > > > >>
>> > > > > > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax <
>> > > > > > matth...@confluent.io
>> > > > > > > >
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >>> @Becket:
>> > > > > > > >>>
>> > > > > > > >>> I am not sure, if I understand this correctly. Instead of
>> > > > applying
>> > > > > a
>> > > > > > > >>> fixed delay, that starts when the first consumer of an
>> > (empty)
>> > > > > group
>> > > > > > > >>> joins, you suggest to re-trigger/re-set the delay each
>> time a
>> > > new
>> > > > > > > >>> consumer joins?
>> > > > > > > >>>
>> > > > > > > >>> This sound like a good strategy to me, if the config is on
>> > the
>> > > > > broker
>> > > > > > > >> side.
>> > > > > > > >>>
>> > > > > > > >>> @Eno:
>> > > > > > > >>>
>> > > > > > > >>> I think that's a valid point and I like this idea!
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>> -Matthias
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote:
>> > > > > > > >>>> Thanks Damian,
>> > > > > > > >>>>
>> > > > > > > >>>> This KIP deals with the initial phase only. What about
>> the
>> > > cases
>> > > > > > when
>> > > > > > > >>> several consumers leave a group? Won't there be several
>> > > expensive
>> > > > > > > >>> rebalances then as well? I'm wondering if it makes sense
>> for
>> > > the
>> > > > > > delay
>> > > > > > > to
>> > > > > > > >>> hold anytime the "set" of consumers in a group changes,
>> be it
>> > > > > > addition
>> > > > > > > to
>> > > > > > > >>> the group or removal from group.
>> > > > > > > >>>>
>> > > > > > > >>>> Thanks
>> > > > > > > >>>> Eno
>> > > > > > > >>>>
>> > > > > > > >>>>
>> > > > > > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin <
>> becket....@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > > >>>>>
>> > > > > > > >>>>> Thanks for the KIP, Damian.
>> > > > > > > >>>>>
>> > > > > > > >>>>> My two cents on this. It seems there are two things
>> worth
>> > > > > thinking
>> > > > > > > >> here:
>> > > > > > > >>>>>
>> > > > > > > >>>>> 1. Better rebalance timing. We will try to rebalance
>> only
>> > > when
>> > > > > all
>> > > > > > > the
>> > > > > > > >>>>> consumers in a group have joined. The challenge would be
>> > > > someone
>> > > > > > has
>> > > > > > > >> to
>> > > > > > > >>>>> define what does ALL consumers mean, it could either be
>> a
>> > > time
>> > > > or
>> > > > > > > >>> number of
>> > > > > > > >>>>> consumers, etc.
>> > > > > > > >>>>>
>> > > > > > > >>>>> 2. Avoid frequent rebalance. For example, if there are
>> 100
>> > > > > > consumers
>> > > > > > > >> in
>> > > > > > > >>> a
>> > > > > > > >>>>> group, today, in the worst case, we may end up with 100
>> > > > > rebalances
>> > > > > > > >> even
>> > > > > > > >>> if
>> > > > > > > >>>>> all the consumers joined the group in a reasonably small
>> > > amount
>> > > > > of
>> > > > > > > >> time.
>> > > > > > > >>>>> Frequent rebalance is also a bad thing for brokers.
>> > > > > > > >>>>>
>> > > > > > > >>>>> Having a client side configuration may solve problem 1
>> > better
>> > > > > > because
>> > > > > > > >>> each
>> > > > > > > >>>>> consumer group can potentially configure their own
>> timing.
>> > > > > However,
>> > > > > > > it
>> > > > > > > >>> does
>> > > > > > > >>>>> not really prevent frequent rebalance in general because
>> > some
>> > > > of
>> > > > > > the
>> > > > > > > >>>>> consumers can be misconfigured. (This may have
>> something to
>> > > do
>> > > > > with
>> > > > > > > >>> KIP-124
>> > > > > > > >>>>> as well. But if quota is applied on the
>> JoinGroup/SyncGroup
>> > > > > request
>> > > > > > > it
>> > > > > > > >>> may
>> > > > > > > >>>>> cause some unwanted cascading effects.)
>> > > > > > > >>>>>
>> > > > > > > >>>>> Having a broker side configuration may result in less
>> > > > flexibility
>> > > > > > for
>> > > > > > > >>> each
>> > > > > > > >>>>> consumer group, but it can prevent frequent rebalance
>> > > better. I
>> > > > > > think
>> > > > > > > >>> with
>> > > > > > > >>>>> some reasonable design, the rebalance timing issue can
>> be
>> > > > > resolved
>> > > > > > on
>> > > > > > > >>> the
>> > > > > > > >>>>> broker side as well. Matthias had a good point on
>> extending
>> > > the
>> > > > > > delay
>> > > > > > > >>> when
>> > > > > > > >>>>> a new consumer joins a group (we actually did something
>> > > similar
>> > > > > to
>> > > > > > > >> batch
>> > > > > > > >>>>> ISR change propagation). For example, let's say on the
>> > broker
>> > > > > side,
>> > > > > > > we
>> > > > > > > >>> will
>> > > > > > > >>>>> always delay 2 seconds each time we see a new consumer
>> > > joining
>> > > > a
>> > > > > > > >>> consumer
>> > > > > > > >>>>> group. This would probably work for most of the consumer
>> > > groups
>> > > > > and
>> > > > > > > >> will
>> > > > > > > >>>>> also limit the rebalance frequency to protect the
>> brokers.
>> > > > > > > >>>>>
>> > > > > > > >>>>> I am not sure about the streams use case here, but if
>> > > something
>> > > > > > like
>> > > > > > > 2
>> > > > > > > >>>>> seconds of delay is acceptable for streams, I would
>> prefer
>> > > > adding
>> > > > > > the
>> > > > > > > >>>>> configuration to the broker so that we can address both
>> > > > problems.
>> > > > > > > >>>>>
>> > > > > > > >>>>> Thanks,
>> > > > > > > >>>>>
>> > > > > > > >>>>> Jiangjie (Becket) Qin
>> > > > > > > >>>>>
>> > > > > > > >>>>>
>> > > > > > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <
>> > > > > damian....@gmail.com>
>> > > > > > > >>> wrote:
>> > > > > > > >>>>>
>> > > > > > > >>>>>> Thanks for the feedback.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Ewen: I'm happy to make it a client side config. Other
>> > than
>> > > > the
>> > > > > > > >>> protocol
>> > > > > > > >>>>>> bump i think the effort is almost the same. Personally
>> i
>> > see
>> > > > no
>> > > > > > > other
>> > > > > > > >>>>>> issues, but based on discussions with others this is
>> what
>> > we
>> > > > > came
>> > > > > > up
>> > > > > > > >>> with.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> True, it can probably be tested easily via an
>> integration
>> > > > test.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Matthias: Yes i agree, the delay could be extended as
>> each
>> > > new
>> > > > > > > member
>> > > > > > > >>> joins
>> > > > > > > >>>>>> the group.
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> Thanks,
>> > > > > > > >>>>>> Damian
>> > > > > > > >>>>>>
>> > > > > > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <
>> > > > > > > >> e...@confluent.io>
>> > > > > > > >>>>>> wrote:
>> > > > > > > >>>>>>
>> > > > > > > >>>>>>> I have the same initial response as Ismael re: broker
>> vs
>> > > > > consumer
>> > > > > > > >>>>>> settings.
>> > > > > > > >>>>>>> The global setting seems questionable.
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>> Could we maybe summarize what the impact of making
>> this a
>> > > > > client
>> > > > > > > >>> config
>> > > > > > > >>>>>>> would be? Protocol bump is obvious, but is there any
>> > other
>> > > > > > > >> significant
>> > > > > > > >>>>>>> issue? For the protocol bump in particular, I think
>> this
>> > > > change
>> > > > > > is
>> > > > > > > >>>>>>> currently really critical for streams; it will be
>> > valuable
>> > > > > > > >> elsewhere,
>> > > > > > > >>> but
>> > > > > > > >>>>>>> the immediate demand is streams, so a protocol bump
>> while
>> > > > being
>> > > > > > > >>> backwards
>> > > > > > > >>>>>>> compatible wouldn't affect any other clients. Is this
>> > still
>> > > > > > > actually
>> > > > > > > >>>>>>> compatible with different clients given that they
>> would
>> > now
>> > > > > > expect
>> > > > > > > >>>>>>> different timeouts? (I think it's strictly compatible
>> if
>> > > you
>> > > > > wait
>> > > > > > > >> for
>> > > > > > > >>>>>>> responses, but if you enforce any client side
>> timeouts,
>> > I'm
>> > > > not
>> > > > > > so
>> > > > > > > >>> sure.)
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>> re: test plan, I'm sure this will come as a surprise,
>> but
>> > > is
>> > > > > the
>> > > > > > > >>> system
>> > > > > > > >>>>>>> test even necessary? Validating # of rebalances seems
>> > messy
>> > > > as
>> > > > > > > other
>> > > > > > > >>>>>> things
>> > > > > > > >>>>>>> can cause rebalances (though admittedly not in a
>> "clean"
>> > > > case).
>> > > > > > But
>> > > > > > > >>>>>> really
>> > > > > > > >>>>>>> it seems like an integration test could validate this
>> by
>> > > > making
>> > > > > > > sure
>> > > > > > > >>>>>> only 1
>> > > > > > > >>>>>>> rebalance occurred when 2 members joined with a
>> > sufficient
>> > > > time
>> > > > > > > gap.
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>> -Ewen
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <
>> > > > > > > >>> matth...@confluent.io>
>> > > > > > > >>>>>>> wrote:
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>>> Thanks for the KIP Damian!
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> My two cents:
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> - we should have an explicit parameter for this --
>> > > implicit
>> > > > > > > setting
>> > > > > > > >>>>>> are
>> > > > > > > >>>>>>>> always tricky (the "importance" of this parameter
>> would
>> > be
>> > > > > LOW)
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> - the config should be different for each consumer
>> > group:
>> > > > > > > >>>>>>>>   * assume you have a stateless app, you want to
>> > rebalance
>> > > > > > > >>> immediately
>> > > > > > > >>>>>>>>   * if you start-up in an visualized environment
>> using
>> > > some
>> > > > > > tools
>> > > > > > > >>> like
>> > > > > > > >>>>>>>> Mesos you might need a different value that on bare
>> > metal
>> > > > (no
>> > > > > VM
>> > > > > > > to
>> > > > > > > >>> be
>> > > > > > > >>>>>>>> started)
>> > > > > > > >>>>>>>>   * it also depends, how many consumer instanced you
>> > > expect
>> > > > --
>> > > > > > > it's
>> > > > > > > >>>>>>>> harder to start up 100 instances in 3 seconds than 5
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> - the default value should be zero
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> One more thought: what about scaling scenarios? If a
>> > > > consumer
>> > > > > > > group
>> > > > > > > >>> has
>> > > > > > > >>>>>>>> 10 instanced and should be scaled up to 20, it would
>> > make
>> > > > > sense
>> > > > > > to
>> > > > > > > >> do
>> > > > > > > >>>>>>>> this with a single rebalance, too. Thus, I am
>> wondering,
>> > > if
>> > > > it
>> > > > > > > >> would
>> > > > > > > >>>>>>>> make sense to apply this delay each time a new
>> consumer
>> > > > joins
>> > > > > > > >> group,
>> > > > > > > >>>>>>>> even if the group is not empty?
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> -Matthias
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote:
>> > > > > > > >>>>>>>>> Thanks Gouzhang - i think another problem with this
>> is
>> > > that
>> > > > > is
>> > > > > > > >>>>>>>> overloading
>> > > > > > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm not
>> > sure
>> > > > > that
>> > > > > > is
>> > > > > > > >> a
>> > > > > > > >>>>>>> good
>> > > > > > > >>>>>>>>> thing.
>> > > > > > > >>>>>>>>>
>> > > > > > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <
>> > > > > wangg...@gmail.com
>> > > > > > >
>> > > > > > > >>>>>> wrote:
>> > > > > > > >>>>>>>>>
>> > > > > > > >>>>>>>>>> The downside of it, though, is that although it
>> > "hides"
>> > > > this
>> > > > > > > from
>> > > > > > > >>>>>> most
>> > > > > > > >>>>>>>> of
>> > > > > > > >>>>>>>>>> the users needing to be aware of it, by default
>> > session
>> > > > > > timeout
>> > > > > > > >>> i.e.
>> > > > > > > >>>>>>> the
>> > > > > > > >>>>>>>>>> rebalance timeout is 10 seconds which could
>> arguably
>> > too
>> > > > > long.
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>> Guozhang
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <
>> > > > > > > >>> wangg...@gmail.com
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>>>>> wrote:
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>>> Just throwing another alternative idea here: we
>> can
>> > > > > consider
>> > > > > > > >> using
>> > > > > > > >>>>>>> the
>> > > > > > > >>>>>>>>>>> rebalance timeout value which is already included
>> in
>> > > the
>> > > > > join
>> > > > > > > >>>>>> request
>> > > > > > > >>>>>>>>>>> protocol (and on the current Java client it is
>> always
>> > > > > written
>> > > > > > > as
>> > > > > > > >>>>>> the
>> > > > > > > >>>>>>>>>>> session timeout value), that the first member
>> joining
>> > > > will
>> > > > > > > >> always
>> > > > > > > >>>>>>> force
>> > > > > > > >>>>>>>>>> the
>> > > > > > > >>>>>>>>>>> coordinator to wait that long. By doing this we do
>> > not
>> > > > need
>> > > > > > to
>> > > > > > > >>> bump
>> > > > > > > >>>>>>> up
>> > > > > > > >>>>>>>>>> the
>> > > > > > > >>>>>>>>>>> protocol either.
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>> Guozhang
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <
>> > > > > > > >> damian....@gmail.com
>> > > > > > > >>>>
>> > > > > > > >>>>>>>>>> wrote:
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>> Hi Ismael,
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>> Mostly to avoid the protocol bump.
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>> I agree that it may be difficult to choose the
>> right
>> > > > delay
>> > > > > > for
>> > > > > > > >>> all
>> > > > > > > >>>>>>>>>>>> consumer
>> > > > > > > >>>>>>>>>>>> groups, but we wanted to make this something that
>> > most
>> > > > > users
>> > > > > > > >>> don't
>> > > > > > > >>>>>>>>>> really
>> > > > > > > >>>>>>>>>>>> need to think about, i.e., a small enough default
>> > > delay
>> > > > > that
>> > > > > > > >>> works
>> > > > > > > >>>>>>> in
>> > > > > > > >>>>>>>>>> the
>> > > > > > > >>>>>>>>>>>> majority of cases. However it would be much more
>> > > > flexible
>> > > > > > as a
>> > > > > > > >>>>>>>> consumer
>> > > > > > > >>>>>>>>>>>> config, which i'm happy to pursue if this change
>> is
>> > > > worthy
>> > > > > > of
>> > > > > > > a
>> > > > > > > >>>>>>>> protocol
>> > > > > > > >>>>>>>>>>>> bump.
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>> Thanks,
>> > > > > > > >>>>>>>>>>>> Damian
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <
>> > > > > ism...@juma.me.uk
>> > > > > > >
>> > > > > > > >>>>>> wrote:
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to
>> avoid
>> > > > > > multiple
>> > > > > > > >>>>>>>>>> rebalances
>> > > > > > > >>>>>>>>>>>>> during start-up. One issue with having this as a
>> > > broker
>> > > > > > > config
>> > > > > > > >>> is
>> > > > > > > >>>>>>>> that
>> > > > > > > >>>>>>>>>>>> it
>> > > > > > > >>>>>>>>>>>>> may be difficult to choose the right delay for
>> all
>> > > > > consumer
>> > > > > > > >>>>>> groups.
>> > > > > > > >>>>>>>>>> Can
>> > > > > > > >>>>>>>>>>>> you
>> > > > > > > >>>>>>>>>>>>> elaborate a little more on why the first
>> > alternative
>> > > > > (add a
>> > > > > > > >>>>>>> consumer
>> > > > > > > >>>>>>>>>>>>> config) was rejected? We bump protocol versions
>> > > > regularly
>> > > > > > > >> (when
>> > > > > > > >>>>>> it
>> > > > > > > >>>>>>>>>> makes
>> > > > > > > >>>>>>>>>>>>> sense), so it would be good to get a bit more
>> > detail.
>> > > > > > > >>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>> Thanks,
>> > > > > > > >>>>>>>>>>>>> Ismael
>> > > > > > > >>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <
>> > > > > > > >>>>>> damian....@gmail.com
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>>>>>> wrote:
>> > > > > > > >>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>> Hi All,
>> > > > > > > >>>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable
>> delay to
>> > > the
>> > > > > > > >> initial
>> > > > > > > >>>>>>>>>>>> consumer
>> > > > > > > >>>>>>>>>>>>>> group rebalance.
>> > > > > > > >>>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>> Please have look here:
>> > > > > > > >>>>>>>>>>>>>> https://cwiki.apache.org/
>> > > > confluence/display/KAFKA/KIP-
>> > > > > > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance
>> > > > > > > >>>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>> Thanks,
>> > > > > > > >>>>>>>>>>>>>> Damian
>> > > > > > > >>>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems
>> the
>> > > > first
>> > > > > > one
>> > > > > > > >> may
>> > > > > > > >>>>>>> have
>> > > > > > > >>>>>>>>>>>> not
>> > > > > > > >>>>>>>>>>>>>> made it.
>> > > > > > > >>>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>> --
>> > > > > > > >>>>>>>>>>> -- Guozhang
>> > > > > > > >>>>>>>>>>>
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>> --
>> > > > > > > >>>>>>>>>> -- Guozhang
>> > > > > > > >>>>>>>>>>
>> > > > > > > >>>>>>>>>
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>>
>> > > > > > > >>>>>>>
>> > > > > > > >>>>>>
>> > > > > > > >>>>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang

Reply via email to