Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Stanislav Kozlovski Wed, 12 Dec 2018 06:13:22 -0800

Hey Jason,

Yes, that is what I meant by
> Given those constraints, I think that we can simply mark the group as
`PreparingRebalance` with a rebalanceTimeout of the server setting `
group.max.session.timeout.ms`. That's a bit long by default (5 minutes) but
I can't seem to come up with a better alternative
So either the timeout or all members calling joinGroup, yes



On Tue, Dec 11, 2018 at 8:14 PM Boyang Chen <[email protected]> wrote:

> Hey Jason,
>
> I think this is the correct understanding. One more question is whether
> you feel
> we should enforce group size cap statically or on runtime?
>
> Boyang
> ________________________________
> From: Jason Gustafson <[email protected]>
> Sent: Tuesday, December 11, 2018 3:24 AM
> To: dev
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hey Stanislav,
>
> Just to clarify, I think what you're suggesting is something like this in
> order to gracefully shrink the group:
>
> 1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
> 2. Continue to allow offset commits and heartbeats for all current members.
> 3. Allow the first n members that send JoinGroup to stay in the group, but
> wait for the JoinGroup (or session timeout) from all active members before
> finishing the rebalance.
>
> So basically we try to give the current members an opportunity to finish
> work, but we prevent some of them from rejoining after the rebalance
> completes. It sounds reasonable if I've understood correctly.
>
> Thanks,
> Jason
>
>
>
> On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <[email protected]> wrote:
>
> > Yep, LGTM on my side. Thanks Stanislav!
> > ________________________________
> > From: Stanislav Kozlovski <[email protected]>
> > Sent: Friday, December 7, 2018 8:51 PM
> > To: [email protected]
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hi,
> >
> > We discussed this offline with Boyang and figured that it's best to not
> > wait on the Cooperative Rebalancing proposal. Our thinking is that we can
> > just force a rebalance from the broker, allowing consumers to commit
> > offsets if their rebalanceListener is configured correctly.
> > When rebalancing improvements are implemented, we assume that they would
> > improve KIP-389's behavior as well as the normal rebalance scenarios
> >
> > On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <[email protected]> wrote:
> >
> > > Hey Stanislav,
> > >
> > > thanks for the question! `Trivial rebalance` means "we don't start
> > > reassignment right now, but you need to know it's coming soon
> > > and you should start preparation".
> > >
> > > An example KStream use case is that before actually starting to shrink
> > the
> > > consumer group, we need to
> > > 1. partition the consumer group into two subgroups, where one will be
> > > offline soon and the other will keep serving;
> > > 2. make sure the states associated with near-future offline consumers
> are
> > > successfully replicated on the serving ones.
> > >
> > > As I have mentioned shrinking the consumer group is pretty much
> > equivalent
> > > to group scaling down, so we could think of this
> > > as an add-on use case for cluster scaling. So my understanding is that
> > the
> > > KIP-389 could be sequenced within our cooperative rebalancing<
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=BX4DHEX1OMgfVuBOREwSjiITu5aV83Q7NAz77w4avVc%3D&amp;reserved=0
> > > >
> > > proposal.
> > >
> > > Let me know if this makes sense.
> > >
> > > Best,
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <[email protected]>
> > > Sent: Wednesday, December 5, 2018 5:52 PM
> > > To: [email protected]
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hey Boyang,
> > >
> > > I think we still need to take care of group shrinkage because even if
> > users
> > > change the config value we cannot guarantee that all consumer groups
> > would
> > > have been manually shrunk.
> > >
> > > Regarding 2., I agree that forcefully triggering a rebalance might be
> the
> > > most intuitive way to handle the situation.
> > > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> > > term.
> > > I was thinking that maybe we could force a rebalance, which would cause
> > > consumers to commit their offsets (given their rebalanceListener is
> > > configured correctly) and subsequently reject some of the incoming
> > > `joinGroup` requests. Does that sound like it would work?
> > >
> > > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <[email protected]>
> wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > I read the latest KIP and saw that we already changed the default
> value
> > > to
> > > > -1. Do
> > > > we still need to take care of the consumer group shrinking when doing
> > the
> > > > upgrade?
> > > >
> > > > However this is an interesting topic that worth discussing. Although
> > > > rolling
> > > > upgrade is fine, `consumer.group.max.size` could always have conflict
> > > with
> > > > the current
> > > > consumer group size which means we need to adhere to one source of
> > truth.
> > > >
> > > > 1.Choose the current group size, which means we never interrupt the
> > > > consumer group until
> > > > it transits to PREPARE_REBALANCE. And we keep track of how many join
> > > group
> > > > requests
> > > > we have seen so far during PREPARE_REBALANCE. After reaching the
> > consumer
> > > > cap,
> > > > we start to inform over provisioned consumers that you should send
> > > > LeaveGroupRequest and
> > > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could
> mark
> > > > extra members
> > > > as hot backup and rebalance without them.
> > > >
> > > > 2.Choose the `consumer.group.max.size`. I feel incremental
> rebalancing
> > > > (you proposed) could be of help here.
> > > > When a new cap is enforced, leader should be notified. If the current
> > > > group size is already over limit, leader
> > > > shall trigger a trivial rebalance to shuffle some topic partitions
> and
> > > let
> > > > a subset of consumers prepare the ownership
> > > > transition. Until they are ready, we trigger a real rebalance to
> remove
> > > > over-provisioned consumers. It is pretty much
> > > > equivalent to `how do we scale down the consumer group without
> > > > interrupting the current processing`.
> > > >
> > > > I personally feel inclined to 2 because we could kill two birds with
> > one
> > > > stone in a generic way. What do you think?
> > > >
> > > > Boyang
> > > > ________________________________
> > > > From: Stanislav Kozlovski <[email protected]>
> > > > Sent: Monday, December 3, 2018 8:35 PM
> > > > To: [email protected]
> > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > > metadata growth
> > > >
> > > > Hi Jason,
> > > >
> > > > > 2. Do you think we should make this a dynamic config?
> > > > I'm not sure. Looking at the config from the perspective of a
> > > prescriptive
> > > > config, we may get away with not updating it dynamically.
> > > > But in my opinion, it always makes sense to have a config be
> > dynamically
> > > > configurable. As long as we limit it to being a cluster-wide config,
> we
> > > > should be fine.
> > > >
> > > > > 1. I think it would be helpful to clarify the details on how the
> > > > coordinator will shrink the group. It will need to choose which
> members
> > > to
> > > > remove. Are we going to give current members an opportunity to commit
> > > > offsets before kicking them from the group?
> > > >
> > > > This turns out to be somewhat tricky. I think that we may not be able
> > to
> > > > guarantee that consumers don't process a message twice.
> > > > My initial approach was to do as much as we could to let consumers
> > commit
> > > > offsets.
> > > >
> > > > I was thinking that we mark a group to be shrunk, we could keep a map
> > of
> > > > consumer_id->boolean indicating whether they have committed offsets.
> I
> > > then
> > > > thought we could delay the rebalance until every consumer commits (or
> > > some
> > > > time passes).
> > > > In the meantime, we would block all incoming fetch calls (by either
> > > > returning empty records or a retriable error) and we would continue
> to
> > > > accept offset commits (even twice for a single consumer)
> > > >
> > > > I see two problems with this approach:
> > > > * We have async offset commits, which implies that we can receive
> fetch
> > > > requests before the offset commit req has been handled. i.e consmer
> > sends
> > > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > > > broker. Meaning we could have saved the offsets for B but rebalance
> > > before
> > > > the offsetCommit for the offsets processed in C come in.
> > > > * KIP-392 Allow consumers to fetch from closest replica
> > > > <
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=bekXj%2FVdA6flZWQ70%2BSEyHm31%2F2WyWO1EpbvqyjWFJw%3D&amp;reserved=0
> > > > >
> > > > would
> > > > make it significantly harder to block poll() calls on consumers whose
> > > > groups are being shrunk. Even if we implemented a solution, the same
> > race
> > > > condition noted above seems to apply and probably others
> > > >
> > > >
> > > > Given those constraints, I think that we can simply mark the group as
> > > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > > group.max.session.timeout.ms`. That's a bit long by default (5
> > minutes)
> > > > but
> > > > I can't seem to come up with a better alternative
> > > >
> > > > I'm interested in hearing your thoughts.
> > > >
> > > > Thanks,
> > > > Stanislav
> > > >
> > > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <[email protected]>
> > > > wrote:
> > > >
> > > > > Hey Stanislav,
> > > > >
> > > > > What do you think about the use case I mentioned in my previous
> reply
> > > > about
> > > > > > a more resilient self-service Kafka? I believe the benefit there
> is
> > > > > bigger.
> > > > >
> > > > >
> > > > > I see this config as analogous to the open file limit. Probably
> this
> > > > limit
> > > > > was intended to be prescriptive at some point about what was
> deemed a
> > > > > reasonable number of open files for an application. But mostly
> people
> > > > treat
> > > > > it as an annoyance which they have to work around. If it happens to
> > be
> > > > hit,
> > > > > usually you just increase it because it is not tied to an actual
> > > resource
> > > > > constraint. However, occasionally hitting the limit does indicate
> an
> > > > > application bug such as a leak, so I wouldn't say it is useless.
> > > > Similarly,
> > > > > the issue in KAFKA-7610 was a consumer leak and having this limit
> > would
> > > > > have allowed the problem to be detected before it impacted the
> > cluster.
> > > > To
> > > > > me, that's the main benefit. It's possible that it could be used
> > > > > prescriptively to prevent poor usage of groups, but like the open
> > file
> > > > > limit, I suspect administrators will just set it large enough that
> > > users
> > > > > are unlikely to complain.
> > > > >
> > > > > Anyway, just a couple additional questions:
> > > > >
> > > > > 1. I think it would be helpful to clarify the details on how the
> > > > > coordinator will shrink the group. It will need to choose which
> > members
> > > > to
> > > > > remove. Are we going to give current members an opportunity to
> commit
> > > > > offsets before kicking them from the group?
> > > > >
> > > > > 2. Do you think we should make this a dynamic config?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > You raise some very valid points.
> > > > > >
> > > > > > > The benefit of this KIP is probably limited to preventing
> > "runaway"
> > > > > > consumer groups due to leaks or some other application bug
> > > > > > What do you think about the use case I mentioned in my previous
> > reply
> > > > > about
> > > > > > a more resilient self-service Kafka? I believe the benefit there
> is
> > > > > bigger
> > > > > >
> > > > > > * Default value
> > > > > > You're right, we probably do need to be conservative. Big
> consumer
> > > > groups
> > > > > > are considered an anti-pattern and my goal was to also hint at
> this
> > > > > through
> > > > > > the config's default. Regardless, it is better to not have the
> > > > potential
> > > > > to
> > > > > > break applications with an upgrade.
> > > > > > Choosing between the default of something big like 5000 or an
> > opt-in
> > > > > > option, I think we should go with the *disabled default option*
> > > (-1).
> > > > > > The only benefit we would get from a big default of 5000 is
> default
> > > > > > protection against buggy/malicious applications that hit the
> > > KAFKA-7610
> > > > > > issue.
> > > > > > While this KIP was spawned from that issue, I believe its value
> is
> > > > > enabling
> > > > > > the possibility of protection and helping move towards a more
> > > > > self-service
> > > > > > Kafka. I also think that a default value of 5000 might be
> > misleading
> > > to
> > > > > > users and lead them to think that big consumer groups (> 250)
> are a
> > > > good
> > > > > > thing.
> > > > > >
> > > > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > > > rebalance
> > > > > > protocol should, in general, be more solid after the planned
> > > > improvements
> > > > > > in KIP-345 and KIP-394.
> > > > > >
> > > > > > * Handling bigger groups during upgrade
> > > > > > I now see that we store the state of consumer groups in the log
> and
> > > > why a
> > > > > > rebalance isn't expected during a rolling upgrade.
> > > > > > Since we're going with the default value of the max.size being
> > > > disabled,
> > > > > I
> > > > > > believe we can afford to be more strict here.
> > > > > > During state reloading of a new Coordinator with a defined
> > > > max.group.size
> > > > > > config, I believe we should *force* rebalances for groups that
> > exceed
> > > > the
> > > > > > configured size. Then, only some consumers will be able to join
> and
> > > the
> > > > > max
> > > > > > size invariant will be satisfied.
> > > > > >
> > > > > > I updated the KIP with a migration plan, rejected alternatives
> and
> > > the
> > > > > new
> > > > > > default value.
> > > > > >
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Stanislav,
> > > > > > >
> > > > > > > Clients will then find that coordinator
> > > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > > since
> > > > > the
> > > > > > > > cache of active consumers is not stored outside the
> > Coordinator's
> > > > > > memory.
> > > > > > > > (please do say if that is incorrect)
> > > > > > >
> > > > > > >
> > > > > > > Groups do not typically rebalance after a coordinator change.
> You
> > > > could
> > > > > > > potentially force a rebalance if the group is too big and kick
> > out
> > > > the
> > > > > > > slowest members or something. A more graceful solution is
> > probably
> > > to
> > > > > > just
> > > > > > > accept the current size and prevent it from getting bigger. We
> > > could
> > > > > log
> > > > > > a
> > > > > > > warning potentially.
> > > > > > >
> > > > > > > My thinking is that we should abstract away from conserving
> > > resources
> > > > > and
> > > > > > > > focus on giving control to the broker. The issue that spawned
> > > this
> > > > > KIP
> > > > > > > was
> > > > > > > > a memory problem but I feel this change is useful in a more
> > > general
> > > > > > way.
> > > > > > >
> > > > > > >
> > > > > > > So you probably already know why I'm asking about this. For
> > > consumer
> > > > > > groups
> > > > > > > anyway, resource usage would typically be proportional to the
> > > number
> > > > of
> > > > > > > partitions that a group is reading from and not the number of
> > > > members.
> > > > > > For
> > > > > > > example, consider the memory use in the offsets cache. The
> > benefit
> > > of
> > > > > > this
> > > > > > > KIP is probably limited to preventing "runaway" consumer groups
> > due
> > > > to
> > > > > > > leaks or some other application bug. That still seems useful
> > > though.
> > > > > > >
> > > > > > > I completely agree with this and I *ask everybody to chime in
> > with
> > > > > > opinions
> > > > > > > > on a sensible default value*.
> > > > > > >
> > > > > > >
> > > > > > > I think we would have to be very conservative. The group
> protocol
> > > is
> > > > > > > generic in some sense, so there may be use cases we don't know
> of
> > > > where
> > > > > > > larger groups are reasonable. Probably we should make this an
> > > opt-in
> > > > > > > feature so that we do not risk breaking anyone's application
> > after
> > > an
> > > > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Jason and Boyang, those were important comments
> > > > > > > >
> > > > > > > > > One suggestion I have is that it would be helpful to put
> your
> > > > > > reasoning
> > > > > > > > on deciding the current default value. For example, in
> certain
> > > use
> > > > > > cases
> > > > > > > at
> > > > > > > > Pinterest we are very likely to have more consumers than 250
> > when
> > > > we
> > > > > > > > configure 8 stream instances with 32 threads.
> > > > > > > > > For the effectiveness of this KIP, we should encourage
> people
> > > to
> > > > > > > discuss
> > > > > > > > their opinions on the default setting and ideally reach a
> > > > consensus.
> > > > > > > >
> > > > > > > > I completely agree with this and I *ask everybody to chime in
> > > with
> > > > > > > opinions
> > > > > > > > on a sensible default value*.
> > > > > > > > My thought process was that in the current model rebalances
> in
> > > > large
> > > > > > > groups
> > > > > > > > are more costly. I imagine most use cases in most Kafka users
> > do
> > > > not
> > > > > > > > require more than 250 consumers.
> > > > > > > > Boyang, you say that you are "likely to have... when we..." -
> > do
> > > > you
> > > > > > have
> > > > > > > > systems running with so many consumers in a group or are you
> > > > planning
> > > > > > > to? I
> > > > > > > > guess what I'm asking is whether this has been tested in
> > > production
> > > > > > with
> > > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > > >
> > > > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > > > will happen to groups that are already larger than the max
> > > size?
> > > > > > > > This is a very important question.
> > > > > > > > From my current understanding, when a coordinator broker gets
> > > shut
> > > > > > > > down during a cluster rolling upgrade, a replica will take
> > > > leadership
> > > > > > of
> > > > > > > > the `__offset_commits` partition. Clients will then find that
> > > > > > coordinator
> > > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > > since
> > > > > the
> > > > > > > > cache of active consumers is not stored outside the
> > Coordinator's
> > > > > > memory.
> > > > > > > > (please do say if that is incorrect)
> > > > > > > > Then, I believe that working as if this is a new group is a
> > > > > reasonable
> > > > > > > > approach. Namely, fail joinGroups when the max.size is
> > exceeded.
> > > > > > > > What do you guys think about this? (I'll update the KIP after
> > we
> > > > > settle
> > > > > > > on
> > > > > > > > a solution)
> > > > > > > >
> > > > > > > > >  Also, just to be clear, the resource we are trying to
> > conserve
> > > > > here
> > > > > > is
> > > > > > > > what? Memory?
> > > > > > > > My thinking is that we should abstract away from conserving
> > > > resources
> > > > > > and
> > > > > > > > focus on giving control to the broker. The issue that spawned
> > > this
> > > > > KIP
> > > > > > > was
> > > > > > > > a memory problem but I feel this change is useful in a more
> > > general
> > > > > > way.
> > > > > > > It
> > > > > > > > limits the control clients have on the cluster and helps
> Kafka
> > > > > become a
> > > > > > > > more self-serving system. Admin/Ops teams can better control
> > the
> > > > > impact
> > > > > > > > application developers can have on a Kafka cluster with this
> > > change
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Stanislav,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Can you clarify the compatibility
> impact
> > > > here?
> > > > > > What
> > > > > > > > > will happen to groups that are already larger than the max
> > > size?
> > > > > > Also,
> > > > > > > > just
> > > > > > > > > to be clear, the resource we are trying to conserve here is
> > > what?
> > > > > > > Memory?
> > > > > > > > >
> > > > > > > > > -Jason
> > > > > > > > >
> > > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> > > that
> > > > it
> > > > > > > would
> > > > > > > > > be
> > > > > > > > > > helpful to put your
> > > > > > > > > >
> > > > > > > > > > reasoning on deciding the current default value. For
> > example,
> > > > in
> > > > > > > > certain
> > > > > > > > > > use cases at Pinterest we are very likely
> > > > > > > > > >
> > > > > > > > > > to have more consumers than 250 when we configure 8
> stream
> > > > > > instances
> > > > > > > > with
> > > > > > > > > > 32 threads.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For the effectiveness of this KIP, we should encourage
> > people
> > > > to
> > > > > > > > discuss
> > > > > > > > > > their opinions on the default setting and ideally reach a
> > > > > > consensus.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > >
> > > > > > > > > > Boyang
> > > > > > > > > >
> > > > > > > > > > ________________________________
> > > > > > > > > > From: Stanislav Kozlovski <[email protected]>
> > > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > > > To: [email protected]
> > > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> > cap
> > > > > > member
> > > > > > > > > > metadata growth
> > > > > > > > > >
> > > > > > > > > > Hey everybody,
> > > > > > > > > >
> > > > > > > > > > It's been a week since this KIP and not much discussion
> has
> > > > been
> > > > > > > made.
> > > > > > > > > > I assume that this is a straight forward change and I
> will
> > > > open a
> > > > > > > > voting
> > > > > > > > > > thread in the next couple of days if nobody has anything
> to
> > > > > > suggest.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Greetings everybody,
> > > > > > > > > > >
> > > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > > > section
> > > > > > and
> > > > > > > > also
> > > > > > > > > > > renamed it.
> > > > > > > > > > > KIP:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=dLVLofL8NnQatVq6WEDukxfIorh7HeQR9TyyUifcAPo%3D&amp;reserved=0
> > > > > > > > > > >
> > > > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hey there everybody,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> > > effort
> > > > > you
> > > > > > > are
> > > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > > > >>
> > > > > > > > > > >> @Matt
> > > > > > > > > > >> I also believe the default value is high. In my
> opinion,
> > > we
> > > > > > should
> > > > > > > > aim
> > > > > > > > > > to
> > > > > > > > > > >> a default cap around 250. This is because in the
> current
> > > > model
> > > > > > any
> > > > > > > > > > consumer
> > > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
> > the
> > > > > group,
> > > > > > > the
> > > > > > > > > > longer
> > > > > > > > > > >> this period of disruption.
> > > > > > > > > > >>
> > > > > > > > > > >> If you have such a large consumer group, chances are
> > that
> > > > your
> > > > > > > > > > >> client-side logic could be structured better and that
> > you
> > > > are
> > > > > > not
> > > > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > > > >> 250 can still be considered of a high upper bound, I
> > > believe
> > > > > in
> > > > > > > > > practice
> > > > > > > > > > >> users should aim to not go over 100 consumers per
> > consumer
> > > > > > group.
> > > > > > > > > > >>
> > > > > > > > > > >> In regards to the cap being global/per-broker, I think
> > > that
> > > > we
> > > > > > > > should
> > > > > > > > > > >> consider whether we want it to be global or
> *per-topic*.
> > > For
> > > > > the
> > > > > > > > time
> > > > > > > > > > >> being, I believe that having it per-topic with a
> global
> > > > > default
> > > > > > > > might
> > > > > > > > > be
> > > > > > > > > > >> the best situation. Having it global only seems a bit
> > > > > > restricting
> > > > > > > to
> > > > > > > > > me
> > > > > > > > > > and
> > > > > > > > > > >> it never hurts to support more fine-grained
> > > configurability
> > > > > > (given
> > > > > > > > > it's
> > > > > > > > > > the
> > > > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > > > >>
> > > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > > > suggestion
> > > > > > > to
> > > > > > > > > > >>> change the default value. Meanwhile I just want to
> > point
> > > > out
> > > > > > that
> > > > > > > > > this
> > > > > > > > > > >>> value is a just last line of defense, not a real
> > scenario
> > > > we
> > > > > > > would
> > > > > > > > > > expect.
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
> > would
> > > > be
> > > > > > > > driving
> > > > > > > > > > the
> > > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea
> in
> > > the
> > > > > > first
> > > > > > > > > place
> > > > > > > > > > and
> > > > > > > > > > >>> had already come up a draft design, while I will keep
> > > > > focusing
> > > > > > on
> > > > > > > > > > KIP-345
> > > > > > > > > > >>> effort to ensure solving the edge case described in
> the
> > > > JIRA<
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=F55UaGVkDXaj4q7v7jUvPL50pD74GE90R7OGX%2FV3f%2Fs%3D&amp;reserved=0
> > > > > > > > > > >.
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > > > >>>
> > > > > > > > > > >>>
> > > > > > > > > > >>> Boyang
> > > > > > > > > > >>>
> > > > > > > > > > >>> ________________________________
> > > > > > > > > > >>> From: Matt Farmer <[email protected]>
> > > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > > > >>> To: [email protected]
> > > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce
> group.max.size
> > to
> > > > cap
> > > > > > > > member
> > > > > > > > > > >>> metadata growth
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks for the KIP.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Will this cap be a global cap across the entire
> cluster
> > > or
> > > > > per
> > > > > > > > > broker?
> > > > > > > > > > >>>
> > > > > > > > > > >>> Either way the default value seems a bit high to me,
> > but
> > > > that
> > > > > > > could
> > > > > > > > > > just
> > > > > > > > > > >>> be
> > > > > > > > > > >>> from my own usage patterns. I'd have probably started
> > > with
> > > > > 500
> > > > > > or
> > > > > > > > 1k
> > > > > > > > > > but
> > > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > > > >>>
> > > > > > > > > > >>> Thanks,
> > > > > > > > > > >>> Matt
> > > > > > > > > > >>>
> > > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > > >>> > Hey folks,
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7Cb603e099d6c744d8fac708d65ed51d03%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636800666735874264&amp;sdata=n%2FHp2DM4k48Q9hayOlc8q5VlcBKFtVWnLDOAzm%2FZ25Y%3D&amp;reserved=0
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > This is a pretty simple change to cap the consumer
> > > group
> > > > > size
> > > > > > > for
> > > > > > > > > > >>> broker
> > > > > > > > > > >>> > stability. Give me your valuable feedback when you
> > got
> > > > > time.
> > > > > > > > > > >>> >
> > > > > > > > > > >>> >
> > > > > > > > > > >>> > Thank you!
> > > > > > > > > > >>> >
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Stanislav
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best,
> > > > > > > > > > > Stanislav
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Stanislav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Reply via email to