Hey Jason and Boyang, those were important comments > One suggestion I have is that it would be helpful to put your reasoning on deciding the current default value. For example, in certain use cases at Pinterest we are very likely to have more consumers than 250 when we configure 8 stream instances with 32 threads. > For the effectiveness of this KIP, we should encourage people to discuss their opinions on the default setting and ideally reach a consensus.
I completely agree with this and I *ask everybody to chime in with opinions on a sensible default value*. My thought process was that in the current model rebalances in large groups are more costly. I imagine most use cases in most Kafka users do not require more than 250 consumers. Boyang, you say that you are "likely to have... when we..." - do you have systems running with so many consumers in a group or are you planning to? I guess what I'm asking is whether this has been tested in production with the current rebalance model (ignoring KIP-345) > Can you clarify the compatibility impact here? What > will happen to groups that are already larger than the max size? This is a very important question. >From my current understanding, when a coordinator broker gets shut down during a cluster rolling upgrade, a replica will take leadership of the `__offset_commits` partition. Clients will then find that coordinator and send `joinGroup` on it, effectively rebuilding the group, since the cache of active consumers is not stored outside the Coordinator's memory. (please do say if that is incorrect) Then, I believe that working as if this is a new group is a reasonable approach. Namely, fail joinGroups when the max.size is exceeded. What do you guys think about this? (I'll update the KIP after we settle on a solution) > Also, just to be clear, the resource we are trying to conserve here is what? Memory? My thinking is that we should abstract away from conserving resources and focus on giving control to the broker. The issue that spawned this KIP was a memory problem but I feel this change is useful in a more general way. It limits the control clients have on the cluster and helps Kafka become a more self-serving system. Admin/Ops teams can better control the impact application developers can have on a Kafka cluster with this change Best, Stanislav On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <ja...@confluent.io> wrote: > Hi Stanislav, > > Thanks for the KIP. Can you clarify the compatibility impact here? What > will happen to groups that are already larger than the max size? Also, just > to be clear, the resource we are trying to conserve here is what? Memory? > > -Jason > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <bche...@outlook.com> wrote: > > > Thanks Stanislav for the update! One suggestion I have is that it would > be > > helpful to put your > > > > reasoning on deciding the current default value. For example, in certain > > use cases at Pinterest we are very likely > > > > to have more consumers than 250 when we configure 8 stream instances with > > 32 threads. > > > > > > For the effectiveness of this KIP, we should encourage people to discuss > > their opinions on the default setting and ideally reach a consensus. > > > > > > Best, > > > > Boyang > > > > ________________________________ > > From: Stanislav Kozlovski <stanis...@confluent.io> > > Sent: Monday, November 26, 2018 6:14 PM > > To: dev@kafka.apache.org > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member > > metadata growth > > > > Hey everybody, > > > > It's been a week since this KIP and not much discussion has been made. > > I assume that this is a straight forward change and I will open a voting > > thread in the next couple of days if nobody has anything to suggest. > > > > Best, > > Stanislav > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski < > > stanis...@confluent.io> > > wrote: > > > > > Greetings everybody, > > > > > > I have enriched the KIP a bit with a bigger Motivation section and also > > > renamed it. > > > KIP: > > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&sdata=C6aXV4T6JWcNPtJhVSNxPrHSm2oTP%2BtGN4XvD4jSUOU%3D&reserved=0 > > > > > > I'm looking forward to discussions around it. > > > > > > Best, > > > Stanislav > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski < > > > stanis...@confluent.io> wrote: > > > > > >> Hey there everybody, > > >> > > >> Thanks for the introduction Boyang. I appreciate the effort you are > > >> putting into improving consumer behavior in Kafka. > > >> > > >> @Matt > > >> I also believe the default value is high. In my opinion, we should aim > > to > > >> a default cap around 250. This is because in the current model any > > consumer > > >> rebalance is disrupting to every consumer. The bigger the group, the > > longer > > >> this period of disruption. > > >> > > >> If you have such a large consumer group, chances are that your > > >> client-side logic could be structured better and that you are not > using > > the > > >> high number of consumers to achieve high throughput. > > >> 250 can still be considered of a high upper bound, I believe in > practice > > >> users should aim to not go over 100 consumers per consumer group. > > >> > > >> In regards to the cap being global/per-broker, I think that we should > > >> consider whether we want it to be global or *per-topic*. For the time > > >> being, I believe that having it per-topic with a global default might > be > > >> the best situation. Having it global only seems a bit restricting to > me > > and > > >> it never hurts to support more fine-grained configurability (given > it's > > the > > >> same config, not a new one being introduced). > > >> > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <bche...@outlook.com> > > wrote: > > >> > > >>> Thanks Matt for the suggestion! I'm still open to any suggestion to > > >>> change the default value. Meanwhile I just want to point out that > this > > >>> value is a just last line of defense, not a real scenario we would > > expect. > > >>> > > >>> > > >>> In the meanwhile, I discussed with Stanislav and he would be driving > > the > > >>> 389 effort from now on. Stanislav proposed the idea in the first > place > > and > > >>> had already come up a draft design, while I will keep focusing on > > KIP-345 > > >>> effort to ensure solving the edge case described in the JIRA< > > >>> > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&sdata=PyOSGb6FhjcIS0XL2vcv2YEUSaYk9lL593ioHS4rRHk%3D&reserved=0 > > >. > > >>> > > >>> > > >>> Thank you Stanislav for making this happen! > > >>> > > >>> > > >>> Boyang > > >>> > > >>> ________________________________ > > >>> From: Matt Farmer <m...@frmr.me> > > >>> Sent: Tuesday, November 20, 2018 10:24 AM > > >>> To: dev@kafka.apache.org > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member > > >>> metadata growth > > >>> > > >>> Thanks for the KIP. > > >>> > > >>> Will this cap be a global cap across the entire cluster or per > broker? > > >>> > > >>> Either way the default value seems a bit high to me, but that could > > just > > >>> be > > >>> from my own usage patterns. I’d have probably started with 500 or 1k > > but > > >>> could be easily convinced that’s wrong. > > >>> > > >>> Thanks, > > >>> Matt > > >>> > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <bche...@outlook.com> > > wrote: > > >>> > > >>> > Hey folks, > > >>> > > > >>> > > > >>> > I would like to start a discussion on KIP-389: > > >>> > > > >>> > > > >>> > > > >>> > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&data=02%7C01%7C%7C085ed04564f2472e50f308d65387f4fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636788240721218938&sdata=DXlRY6ydvXSjMU0CaTvoEj65DOC4d0p02hzu6IdGyk8%3D&reserved=0 > > >>> > > > >>> > > > >>> > This is a pretty simple change to cap the consumer group size for > > >>> broker > > >>> > stability. Give me your valuable feedback when you got time. > > >>> > > > >>> > > > >>> > Thank you! > > >>> > > > >>> > > >> > > >> > > >> -- > > >> Best, > > >> Stanislav > > >> > > > > > > > > > -- > > > Best, > > > Stanislav > > > > > > > > > -- > > Best, > > Stanislav > > > -- Best, Stanislav