@Ismael - yeah sure we can reduce the default, though i'm not sure what the
"right" default would be.

On Tue, 28 Mar 2017 at 15:40 Ismael Juma <ism...@juma.me.uk> wrote:

> Is 3 seconds the right default if the timer gets reset after each consumer
> joins? Maybe we can lower the default value given the new approach.
>
> Ismael
>
> On Tue, Mar 28, 2017 at 9:53 AM, Damian Guy <damian....@gmail.com> wrote:
>
> > All,
> > I'd like to get this back to the original discussion about Delaying
> initial
> > consumer group rebalance.
> > I think i'm leaning towards sticking with the broker config and changing
> > the delay so that the timer starts again when a new consumer joins the
> > group. What are peoples thoughts on that?
> >
> > Doing something similar on leave is valid, but i'd prefer to consider it
> > separately from this.
> >
> > Thanks,
> > Damian
> >
> > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com> wrote:
> >
> > > Matthias,
> > >
> > > Yes i know.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <matth...@confluent.io>
> > > wrote:
> > >
> > > Damian,
> > >
> > > about "rebalance immediately" on timeout -- I guess, that's a different
> > > case as no LeaveGroupRequest will be sent. Thus, the broker should be
> > > able to distinguish both cases easily, and apply the delay only if it
> > > received the LeaveGroupRequest but not if a consumer times out.
> > >
> > > Does this make sense?
> > >
> > > -Matthias
> > >
> > > On 3/27/17 1:56 AM, Damian Guy wrote:
> > > > @Becket
> > > >
> > > > Thanks for the feedback. Yes, i like the idea of extending the delay
> as
> > > > each new consumer joins the group. Though, i think this could be done
> > > with
> > > > either a consumer or broker side config. But i get your point that
> some
> > > > consumers in the group can be misconfigured.
> > > >
> > > > @Matthias & @Eno - yes we could probably do something similar if the
> > > member
> > > > has sent the LeaveGroupRequest. I'm not sure it would be valid if the
> > > > member crashed, hence session.timeout would come into play, we'd
> > probably
> > > > want to rebalance immediately. I'd be interested in hearing thoughts
> > from
> > > > other core kafka folks on this one.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > >
> > > >
> > > > On Fri, 24 Mar 2017 at 23:01 Becket Qin <becket....@gmail.com>
> wrote:
> > > >
> > > >> Hi Matthias,
> > > >>
> > > >> Yes, that was what I was thinking. We will keep delay it until
> either
> > > >> reaching the rebalance timeout or no new consumer joins in that
> small
> > > delay
> > > >> which is configured on the broker side.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax <
> > matth...@confluent.io
> > > >
> > > >> wrote:
> > > >>
> > > >>> @Becket:
> > > >>>
> > > >>> I am not sure, if I understand this correctly. Instead of applying
> a
> > > >>> fixed delay, that starts when the first consumer of an (empty)
> group
> > > >>> joins, you suggest to re-trigger/re-set the delay each time a new
> > > >>> consumer joins?
> > > >>>
> > > >>> This sound like a good strategy to me, if the config is on the
> broker
> > > >> side.
> > > >>>
> > > >>> @Eno:
> > > >>>
> > > >>> I think that's a valid point and I like this idea!
> > > >>>
> > > >>>
> > > >>> -Matthias
> > > >>>
> > > >>>
> > > >>> On 3/24/17 1:23 PM, Eno Thereska wrote:
> > > >>>> Thanks Damian,
> > > >>>>
> > > >>>> This KIP deals with the initial phase only. What about the cases
> > when
> > > >>> several consumers leave a group? Won't there be several expensive
> > > >>> rebalances then as well? I'm wondering if it makes sense for the
> > delay
> > > to
> > > >>> hold anytime the "set" of consumers in a group changes, be it
> > addition
> > > to
> > > >>> the group or removal from group.
> > > >>>>
> > > >>>> Thanks
> > > >>>> Eno
> > > >>>>
> > > >>>>
> > > >>>>> On 24 Mar 2017, at 20:04, Becket Qin <becket....@gmail.com>
> wrote:
> > > >>>>>
> > > >>>>> Thanks for the KIP, Damian.
> > > >>>>>
> > > >>>>> My two cents on this. It seems there are two things worth
> thinking
> > > >> here:
> > > >>>>>
> > > >>>>> 1. Better rebalance timing. We will try to rebalance only when
> all
> > > the
> > > >>>>> consumers in a group have joined. The challenge would be someone
> > has
> > > >> to
> > > >>>>> define what does ALL consumers mean, it could either be a time or
> > > >>> number of
> > > >>>>> consumers, etc.
> > > >>>>>
> > > >>>>> 2. Avoid frequent rebalance. For example, if there are 100
> > consumers
> > > >> in
> > > >>> a
> > > >>>>> group, today, in the worst case, we may end up with 100
> rebalances
> > > >> even
> > > >>> if
> > > >>>>> all the consumers joined the group in a reasonably small amount
> of
> > > >> time.
> > > >>>>> Frequent rebalance is also a bad thing for brokers.
> > > >>>>>
> > > >>>>> Having a client side configuration may solve problem 1 better
> > because
> > > >>> each
> > > >>>>> consumer group can potentially configure their own timing.
> However,
> > > it
> > > >>> does
> > > >>>>> not really prevent frequent rebalance in general because some of
> > the
> > > >>>>> consumers can be misconfigured. (This may have something to do
> with
> > > >>> KIP-124
> > > >>>>> as well. But if quota is applied on the JoinGroup/SyncGroup
> request
> > > it
> > > >>> may
> > > >>>>> cause some unwanted cascading effects.)
> > > >>>>>
> > > >>>>> Having a broker side configuration may result in less flexibility
> > for
> > > >>> each
> > > >>>>> consumer group, but it can prevent frequent rebalance better. I
> > think
> > > >>> with
> > > >>>>> some reasonable design, the rebalance timing issue can be
> resolved
> > on
> > > >>> the
> > > >>>>> broker side as well. Matthias had a good point on extending the
> > delay
> > > >>> when
> > > >>>>> a new consumer joins a group (we actually did something similar
> to
> > > >> batch
> > > >>>>> ISR change propagation). For example, let's say on the broker
> side,
> > > we
> > > >>> will
> > > >>>>> always delay 2 seconds each time we see a new consumer joining a
> > > >>> consumer
> > > >>>>> group. This would probably work for most of the consumer groups
> and
> > > >> will
> > > >>>>> also limit the rebalance frequency to protect the brokers.
> > > >>>>>
> > > >>>>> I am not sure about the streams use case here, but if something
> > like
> > > 2
> > > >>>>> seconds of delay is acceptable for streams, I would prefer adding
> > the
> > > >>>>> configuration to the broker so that we can address both problems.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>>
> > > >>>>> Jiangjie (Becket) Qin
> > > >>>>>
> > > >>>>>
> > > >>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <
> damian....@gmail.com>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> Thanks for the feedback.
> > > >>>>>>
> > > >>>>>> Ewen: I'm happy to make it a client side config. Other than the
> > > >>> protocol
> > > >>>>>> bump i think the effort is almost the same. Personally i see no
> > > other
> > > >>>>>> issues, but based on discussions with others this is what we
> came
> > up
> > > >>> with.
> > > >>>>>>
> > > >>>>>> True, it can probably be tested easily via an integration test.
> > > >>>>>>
> > > >>>>>> Matthias: Yes i agree, the delay could be extended as each new
> > > member
> > > >>> joins
> > > >>>>>> the group.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Damian
> > > >>>>>>
> > > >>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <
> > > >> e...@confluent.io>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> I have the same initial response as Ismael re: broker vs
> consumer
> > > >>>>>> settings.
> > > >>>>>>> The global setting seems questionable.
> > > >>>>>>>
> > > >>>>>>> Could we maybe summarize what the impact of making this a
> client
> > > >>> config
> > > >>>>>>> would be? Protocol bump is obvious, but is there any other
> > > >> significant
> > > >>>>>>> issue? For the protocol bump in particular, I think this change
> > is
> > > >>>>>>> currently really critical for streams; it will be valuable
> > > >> elsewhere,
> > > >>> but
> > > >>>>>>> the immediate demand is streams, so a protocol bump while being
> > > >>> backwards
> > > >>>>>>> compatible wouldn't affect any other clients. Is this still
> > > actually
> > > >>>>>>> compatible with different clients given that they would now
> > expect
> > > >>>>>>> different timeouts? (I think it's strictly compatible if you
> wait
> > > >> for
> > > >>>>>>> responses, but if you enforce any client side timeouts, I'm not
> > so
> > > >>> sure.)
> > > >>>>>>>
> > > >>>>>>> re: test plan, I'm sure this will come as a surprise, but is
> the
> > > >>> system
> > > >>>>>>> test even necessary? Validating # of rebalances seems messy as
> > > other
> > > >>>>>> things
> > > >>>>>>> can cause rebalances (though admittedly not in a "clean" case).
> > But
> > > >>>>>> really
> > > >>>>>>> it seems like an integration test could validate this by making
> > > sure
> > > >>>>>> only 1
> > > >>>>>>> rebalance occurred when 2 members joined with a sufficient time
> > > gap.
> > > >>>>>>>
> > > >>>>>>> -Ewen
> > > >>>>>>>
> > > >>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <
> > > >>> matth...@confluent.io>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks for the KIP Damian!
> > > >>>>>>>>
> > > >>>>>>>> My two cents:
> > > >>>>>>>>
> > > >>>>>>>> - we should have an explicit parameter for this -- implicit
> > > setting
> > > >>>>>> are
> > > >>>>>>>> always tricky (the "importance" of this parameter would be
> LOW)
> > > >>>>>>>>
> > > >>>>>>>> - the config should be different for each consumer group:
> > > >>>>>>>>   * assume you have a stateless app, you want to rebalance
> > > >>> immediately
> > > >>>>>>>>   * if you start-up in an visualized environment using some
> > tools
> > > >>> like
> > > >>>>>>>> Mesos you might need a different value that on bare metal (no
> VM
> > > to
> > > >>> be
> > > >>>>>>>> started)
> > > >>>>>>>>   * it also depends, how many consumer instanced you expect --
> > > it's
> > > >>>>>>>> harder to start up 100 instances in 3 seconds than 5
> > > >>>>>>>>
> > > >>>>>>>> - the default value should be zero
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> One more thought: what about scaling scenarios? If a consumer
> > > group
> > > >>> has
> > > >>>>>>>> 10 instanced and should be scaled up to 20, it would make
> sense
> > to
> > > >> do
> > > >>>>>>>> this with a single rebalance, too. Thus, I am wondering, if it
> > > >> would
> > > >>>>>>>> make sense to apply this delay each time a new consumer joins
> > > >> group,
> > > >>>>>>>> even if the group is not empty?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> -Matthias
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote:
> > > >>>>>>>>> Thanks Gouzhang - i think another problem with this is that
> is
> > > >>>>>>>> overloading
> > > >>>>>>>>> session.timeout.ms to mean multiple things. I'm not sure
> that
> > is
> > > >> a
> > > >>>>>>> good
> > > >>>>>>>>> thing.
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <
> wangg...@gmail.com
> > >
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> The downside of it, though, is that although it "hides" this
> > > from
> > > >>>>>> most
> > > >>>>>>>> of
> > > >>>>>>>>>> the users needing to be aware of it, by default session
> > timeout
> > > >>> i.e.
> > > >>>>>>> the
> > > >>>>>>>>>> rebalance timeout is 10 seconds which could arguably too
> long.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <
> > > >>> wangg...@gmail.com
> > > >>>>>>>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Just throwing another alternative idea here: we can
> consider
> > > >> using
> > > >>>>>>> the
> > > >>>>>>>>>>> rebalance timeout value which is already included in the
> join
> > > >>>>>> request
> > > >>>>>>>>>>> protocol (and on the current Java client it is always
> written
> > > as
> > > >>>>>> the
> > > >>>>>>>>>>> session timeout value), that the first member joining will
> > > >> always
> > > >>>>>>> force
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> coordinator to wait that long. By doing this we do not need
> > to
> > > >>> bump
> > > >>>>>>> up
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> protocol either.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Guozhang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <
> > > >> damian....@gmail.com
> > > >>>>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Ismael,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Mostly to avoid the protocol bump.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I agree that it may be difficult to choose the right delay
> > for
> > > >>> all
> > > >>>>>>>>>>>> consumer
> > > >>>>>>>>>>>> groups, but we wanted to make this something that most
> users
> > > >>> don't
> > > >>>>>>>>>> really
> > > >>>>>>>>>>>> need to think about, i.e., a small enough default delay
> that
> > > >>> works
> > > >>>>>>> in
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>> majority of cases. However it would be much more flexible
> > as a
> > > >>>>>>>> consumer
> > > >>>>>>>>>>>> config, which i'm happy to pursue if this change is worthy
> > of
> > > a
> > > >>>>>>>> protocol
> > > >>>>>>>>>>>> bump.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>> Damian
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <
> ism...@juma.me.uk
> > >
> > > >>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to avoid
> > multiple
> > > >>>>>>>>>> rebalances
> > > >>>>>>>>>>>>> during start-up. One issue with having this as a broker
> > > config
> > > >>> is
> > > >>>>>>>> that
> > > >>>>>>>>>>>> it
> > > >>>>>>>>>>>>> may be difficult to choose the right delay for all
> consumer
> > > >>>>>> groups.
> > > >>>>>>>>>> Can
> > > >>>>>>>>>>>> you
> > > >>>>>>>>>>>>> elaborate a little more on why the first alternative
> (add a
> > > >>>>>>> consumer
> > > >>>>>>>>>>>>> config) was rejected? We bump protocol versions regularly
> > > >> (when
> > > >>>>>> it
> > > >>>>>>>>>> makes
> > > >>>>>>>>>>>>> sense), so it would be good to get a bit more detail.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> Ismael
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <
> > > >>>>>> damian....@gmail.com
> > > >>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi All,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I've prepared a KIP to add a configurable delay to the
> > > >> initial
> > > >>>>>>>>>>>> consumer
> > > >>>>>>>>>>>>>> group rebalance.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Please have look here:
> > > >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>> Damian
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems the first
> > one
> > > >> may
> > > >>>>>>> have
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>> made it.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
>

Reply via email to