Thanks Matthias for bringing this awesome proposal up! I shall take a deeper look and make a comparison between the two proposals.
Meanwhile for the scale down specifically for stateful streaming, we could actually introduce a new status called "learner" where the newly up hosts could try to catch up with the assigned task progress first before triggering the rebalance, from which we don't see a sudden dip on the progress. However, it is built on top of the success of KIP-345. ________________________________ From: Matthias J. Sax <matth...@confluent.io> Sent: Wednesday, November 7, 2018 7:02 AM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id Hey, there was quite a pause on this KIP discussion and in the mean time, a new design for incremental cooporative rebalance was suggested: https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing%3A+Support+and+Policies We should make sure that the proposal and this KIP align to each other. Thoughts? -Matthias On 11/5/18 7:31 PM, Boyang Chen wrote: > Hey Mike, > > > thanks for the feedback, the two question are very thoughtful! > > >> 1) I am a little confused about the distinction for the leader. If the >> consumer node that was assigned leader does a bounce (goes down and quickly >> comes up) to update application code, will a rebalance be triggered? I > do >> not think a bounce of the leader should trigger a rebalance. > > For Q1 my intention was to minimize the change within one KIP, since the > leader rejoining case could be addressed separately. > > >> 2) The timeout for shrink up makes a lot of sense and allows to gracefully >> increase the number of nodes in the cluster. I think we need to support >> graceful shrink down as well. If I set the registration timeout to 5 minutes >> > to handle rolling restarts or intermittent failures without shuffling >> state, I don't want to wait 5 minutes in order for the group to rebalance if >> I am intentionally removing a node from the cluster. I am not sure the best >> way to > do this. One idea I had was adding the ability for a CLI or Admin >> API to force a rebalance of the group. This would allow for an admin to >> trigger the rebalance manually without waiting the entire registration >> timeout on > shrink down. What do you think? > > For 2) my understanding is that for scaling down case it is better to be > addressed by CLI tool than code logic, since only by human evaluation we > could decide whether it is a "right timing" -- the time when all the scaling > down consumers are offline -- to kick in rebalance. Unless we introduce > another term on coordinator which indicates the target consumer group size, > broker will find it hard to decide when to start rebalance. So far I prefer > to hold the implementation for that, but agree we could discuss whether we > want to introduce admin API in this KIP or a separate one. > > > Thanks again for the proposed ideas! > > > Boyang > > ________________________________ > From: Mike Freyberger <mike.freyber...@xandr.com> > Sent: Monday, November 5, 2018 6:13 AM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by > specifying member id > > Boyang, > > Thanks for updating the KIP. It's shaping up well. Two things: > > 1) I am a little confused about the distinction for the leader. If the > consumer node that was assigned leader does a bounce (goes down and quickly > comes up) to update application code, will a rebalance be triggered? I do not > think a bounce of the leader should trigger a rebalance. > > 2) The timeout for shrink up makes a lot of sense and allows to gracefully > increase the number of nodes in the cluster. I think we need to support > graceful shrink down as well. If I set the registration timeout to 5 minutes > to handle rolling restarts or intermittent failures without shuffling state, > I don't want to wait 5 minutes in order for the group to rebalance if I am > intentionally removing a node from the cluster. I am not sure the best way to > do this. One idea I had was adding the ability for a CLI or Admin API to > force a rebalance of the group. This would allow for an admin to trigger the > rebalance manually without waiting the entire registration timeout on shrink > down. What do you think? > > Mike > > On 10/30/18, 1:55 AM, "Boyang Chen" <bche...@outlook.com> wrote: > > Btw, I updated KIP 345 based on my understanding. Feel free to take > another round of look: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances > KIP-345: Introduce static membership protocol to reduce > ...<https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances> > cwiki.apache.org > For stateful applications, one of the biggest performance bottleneck is the > state shuffling. In Kafka consumer, there is a concept called "rebalance" > which means that for given M partitions and N consumers in one consumer > group, Kafka will try to balance the load between consumers and ideally have > ... > > > > > KIP-345: Introduce static membership protocol to reduce > ...<https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances> > cwiki.apache.org > For stateful applications, one of the biggest performance bottleneck is > the state shuffling. In Kafka consumer, there is a concept called "rebalance" > which means that for given M partitions and N consumers in one consumer > group, Kafka will try to balance the load between consumers and ideally have > ... > > > > > > ________________________________ > From: Boyang Chen <bche...@outlook.com> > Sent: Monday, October 29, 2018 12:34 PM > To: dev@kafka.apache.org > Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by > specifying member id > > Thanks everyone for the input on this thread! (Sorry it's been a while) I > feel that we are very close to the final solution. > > > Hey Jason and Mike, I have two quick questions on the new features here: > > 1. so our proposal is that until we add a new static member into the > group (scale up), we will not trigger rebalance until the "registration > timeout"( the member has been offline for too long)? How about leader's > rejoin request, I think we should still trigger rebalance when that happens, > since the consumer group may have new topics to consume? > 2. I'm not very clear on the scale up scenario in static membership > here. Should we fallback to dynamic membership while adding/removing hosts > (by setting member.name = null), or we still want to add instances with > `member.name` so that we eventually expand/shrink the static membership? I > personally feel the easier solution is to spin up new members and wait until > either the same "registration timeout" or a "scale up timeout" before > starting the rebalance. What do you think? > > Meanwhile I will go ahead to make changes to the KIP with our newly > discussed items and details. Really excited to see the design has become more > solid. > > Best, > Boyang > > ________________________________ > From: Jason Gustafson <ja...@confluent.io> > Sent: Saturday, August 25, 2018 6:04 AM > To: dev > Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by > specifying member id > > Hey Mike, > > Yeah, that's a good point. A long "registration timeout" may not be a > great > idea. Perhaps in practice you'd set it long enough to be able to detect a > failure and provision a new instance. Maybe on the order of 10 minutes is > more reasonable. > > In any case, it's probably a good idea to have an administrative way to > force deregistration. One option is to extend the DeleteGroups API with a > list of members names. > > -Jason > > > > On Fri, Aug 24, 2018 at 2:21 PM, Mike Freyberger > <mfreyber...@appnexus.com> > wrote: > > > Jason, > > > > Regarding step 4 in your proposal which suggests beginning a long timer > > (30 minutes) when a static member leaves the group, would there also be > the > > ability for an admin to force a static membership expiration? > > > > I’m thinking that during particular types of outages or upgrades users > > would want forcefully remove a static member from the group. > > > > So the user would shut the consumer down normally, which wouldn’t > trigger > > a rebalance. Then the user could use an admin CLI tool to force remove > that > > consumer from the group, so the TopicPartitions that were previously > owned > > by that consumer can be released. > > > > At a high level, we need consumer groups to gracefully handle > intermittent > > failures and permanent failures. Currently, the consumer group protocol > > handles permanent failures well, but does not handle intermittent > failures > > well (it creates unnecessary rebalances). I want to make sure the > overall > > solution here handles both intermittent failures and permanent failures, > > rather than sacrificing support for permanent failures in order to > provide > > support for intermittent failures. > > > > Mike > > > > Sent from my iPhone > > > > > On Aug 24, 2018, at 3:03 PM, Jason Gustafson <ja...@confluent.io> > wrote: > > > > > > Hey Guozhang, > > > > > > Responses below: > > > > > > Originally I was trying to kill more birds with one stone with > KIP-345, > > >> e.g. to fix the multi-rebalance issue on starting up / shutting down > a > > >> multi-instance client (mentioned as case 1)/2) in my early email), > and > > >> hence proposing to have a pure static-membership protocol. But > thinking > > >> twice about it I now feel it may be too ambitious and worth fixing in > > >> another KIP. > > > > > > > > > I was considering an extension to support pre-initialization of the > > static > > > members of the group, but I agree we should probably leave this > problem > > for > > > future work. > > > > > > 1. How this longish static member expiration timeout defined? Is it > via a > > >> broker, hence global config, or via a client config which can be > > >> communicated to broker via JoinGroupRequest? > > > > > > > > > I am not too sure. I tend to lean toward server-side configs because > they > > > are easier to evolve. If we have to add something to the protocol, > then > > > we'll be stuck with it forever. > > > > > > 2. Assuming that for static members, LEAVE_GROUP request will not > > trigger a > > >> rebalance immediately either, similar to session timeout, but only > the > > >> longer member expiration timeout, can we remove the internal " > > >> internal.leave.group.on.close" config, which is a quick walk-around > > then? > > > > > > > > > Yeah, I hope we can ultimately get rid of it, but we may need it for > > > compatibility with older brokers. A related question is what should be > > the > > > behavior of the consumer if `member.name` is provided but the broker > > does > > > not support it? We could either fail or silently downgrade to dynamic > > > membership. > > > > > > -Jason > > > > > > > > >> On Fri, Aug 24, 2018 at 11:44 AM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > >> > > >> Hey Jason, > > >> > > >> I like your idea to simplify the upgrade protocol to allow co-exist > of > > >> static and dynamic members. Admittedly it may make the > coordinator-side > > >> logic a bit more complex, but I think it worth doing it. > > >> > > >> Originally I was trying to kill more birds with one stone with > KIP-345, > > >> e.g. to fix the multi-rebalance issue on starting up / shutting down > a > > >> multi-instance client (mentioned as case 1)/2) in my early email), > and > > >> hence proposing to have a pure static-membership protocol. But > thinking > > >> twice about it I now feel it may be too ambitious and worth fixing in > > >> another KIP. With that, I think what you've proposed here is a good > way > > to > > >> go for KIP-345 itself. > > >> > > >> Note there are a few details in your proposal we'd still need to > figure > > >> out: > > >> > > >> 1. How this longish static member expiration timeout defined? Is it > via > > a > > >> broker, hence global config, or via a client config which can be > > >> communicated to broker via JoinGroupRequest? > > >> > > >> 2. Assuming that for static members, LEAVE_GROUP request will not > > trigger a > > >> rebalance immediately either, similar to session timeout, but only > the > > >> longer member expiration timeout, can we remove the internal " > > >> internal.leave.group.on.close" config, which is a quick walk-around > > then? > > >> > > >> > > >> > > >> Guozhang > > >> > > >> > > >> On Fri, Aug 24, 2018 at 11:14 AM, Jason Gustafson > <ja...@confluent.io> > > >> wrote: > > >> > > >>> Hey All, > > >>> > > >>> Nice to see some solid progress on this. It sounds like one of the > > >>> complications is allowing static and dynamic registration to > coexist. > > I'm > > >>> wondering if we can do something like the following: > > >>> > > >>> 1. Statically registered members (those joining the group with a > > >> non-null ` > > >>> member.name`) maintain a session with the coordinator just like > > dynamic > > >>> members. > > >>> 2. If a session is active for a static member when a rebalance > begins, > > >> then > > >>> basically we'll keep the current behavior. The rebalance will await > the > > >>> static member joining the group. > > >>> 3. If a static member does not have an active session, then the > > >> coordinator > > >>> will not wait for it to join, but will still include it in the > > rebalance. > > >>> The coordinator will forward the cached subscription information to > the > > >>> leader and will cache the assignment after the rebalance completes. > > (Note > > >>> that we still have the generationId to fence offset commits from a > > static > > >>> zombie if the assignment changes.) > > >>> 4. When a static member leaves the group or has its session expire, > no > > >>> rebalance is triggered. Instead, we can begin a timer to expire the > > >> static > > >>> registration. This would be a longish timeout (like 30 minutes say). > > >>> > > >>> So basically static members participate in all rebalances regardless > > >>> whether they have an active session. In a given rebalance, some of > the > > >>> members may be static and some dynamic. The group leader can > > >> differentiate > > >>> the two based on the presence of the `member.name` (we have to add > > this > > >> to > > >>> the JoinGroupResponse). Generally speaking, we would choose leaders > > >>> preferentially from the active members that support the latest > > JoinGroup > > >>> protocol and are using static membership. If we have to choose a > leader > > >>> with an old version, however, it would see all members in the group > > >> (static > > >>> or dynamic) as dynamic members and perform the assignment as usual. > > >>> > > >>> Would that work? > > >>> > > >>> -Jason > > >>> > > >>> > > >>> On Thu, Aug 23, 2018 at 5:26 PM, Guozhang Wang <wangg...@gmail.com> > > >> wrote: > > >>> > > >>>> Hello Boyang, > > >>>> > > >>>> Thanks for the updated proposal, a few questions: > > >>>> > > >>>> 1. Where will "change-group-timeout" be communicated to the broker? > > >> Will > > >>>> that be a new field in the JoinGroupRequest, or are we going to > > >>> piggy-back > > >>>> on the existing session-timeout field (assuming that the original > > value > > >>>> will not be used anywhere in the static membership any more)? > > >>>> > > >>>> 2. "However, if the consumer takes longer than session timeout to > > >> return, > > >>>> we shall still trigger rebalance but it could still try to catch > > >>>> `change-group-timeout`.": what does this mean? I thought your > proposal > > >> is > > >>>> that for static memberships, the broker will NOT trigger rebalance > > even > > >>>> after session-timeout has been detected, but only that after > > >>>> change-group-timeout > > >>>> which is supposed to be longer than session-timeout to be defined? > > >>>> > > >>>> 3. "A join group request with member.name set will be treated as > > >>>> `static-membership` strategy", in this case, how would the switch > from > > >>>> dynamic to static happen, since whoever changed the member.name to > > >>>> not-null > > >>>> will be rejected, right? > > >>>> > > >>>> 4. "just erase the cached mapping, and wait for session timeout to > > >>> trigger > > >>>> rebalance should be sufficient." this is also a bit unclear to me: > who > > >>> will > > >>>> erase the cached mapping? Since it is on the broker-side I assume > that > > >>>> broker has to do it. Are you suggesting to use a new request for > it? > > >>>> > > >>>> 5. "Halfway switch": following 3) above, if your proposal is > basically > > >> to > > >>>> let "first join-request wins", and the strategy will stay as is > until > > >> all > > >>>> members are gone, then this will also not happen since whoever used > > >>>> different strategy as the first guy who sends join-group request > will > > >> be > > >>>> rejected right? > > >>>> > > >>>> > > >>>> Guozhang > > >>>> > > >>>> > > >>>> On Tue, Aug 21, 2018 at 9:28 AM, John Roesler <j...@confluent.io> > > >> wrote: > > >>>> > > >>>>> This sounds good to me! > > >>>>> > > >>>>> Thanks for the time you've spent on it, > > >>>>> -John > > >>>>> > > >>>>> On Tue, Aug 21, 2018 at 12:13 AM Boyang Chen <bche...@outlook.com> > > >>>> wrote: > > >>>>> > > >>>>>> Thanks Matthias for the input. Sorry I was busy recently and > > >> haven't > > >>>> got > > >>>>>> time to update this thread. To summarize what we come up so far, > > >> here > > >>>> is > > >>>>> a > > >>>>>> draft updated plan: > > >>>>>> > > >>>>>> > > >>>>>> Introduce a new config called `member.name` which is supposed to > > >> be > > >>>>>> provided uniquely by the consumer client. The broker will > maintain > > >> a > > >>>>> cache > > >>>>>> with [key:member.name, value:member.id]. A join group request > with > > >>>>>> member.name set will be treated as `static-membership` strategy, > > >> and > > >>>>> will > > >>>>>> reject any join group request without member.name. So this > > >>>> coordination > > >>>>>> change will be differentiated from the `dynamic-membership` > > >> protocol > > >>> we > > >>>>>> currently have. > > >>>>>> > > >>>>>> > > >>>>>> When handling static join group request: > > >>>>>> > > >>>>>> 1. The broker will check the membership to see whether this is > > >> a > > >>>> new > > >>>>>> member. If new, broker allocate a unique member id, cache the > > >> mapping > > >>>> and > > >>>>>> move to rebalance stage. > > >>>>>> 2. Following 1, if this is an existing member, broker will not > > >>>> change > > >>>>>> group state, and return its cached member.id and current > > >> assignment. > > >>>>>> (unless this is leader, we shall trigger rebalance) > > >>>>>> 3. Although Guozhang has mentioned we could rejoin with pair > > >>> member > > >>>>>> name and id, I think for join group request it is ok to leave > > >> member > > >>> id > > >>>>>> blank as member name is the unique identifier. In commit offset > > >>> request > > >>>>> we > > >>>>>> *must* have both. > > >>>>>> > > >>>>>> > > >>>>>> When handling commit offset request, if enabled with static > > >>> membership, > > >>>>>> each time the commit request must have both member.name and > > >>> member.id > > >>>> to > > >>>>>> be identified as a `certificated member`. If not, this means > there > > >>> are > > >>>>>> duplicate consumer members with same member name and the request > > >> will > > >>>> be > > >>>>>> rejected to guarantee consumption uniqueness. > > >>>>>> > > >>>>>> > > >>>>>> When rolling restart/shutting down gracefully, the client will > > >> send a > > >>>>>> leave group request (static membership mode). In static > membership, > > >>> we > > >>>>> will > > >>>>>> also define `change-group-timeout` to hold on rebalance provided > by > > >>>>> leader. > > >>>>>> So we will wait for all the members to rejoin the group and do > > >>> exactly > > >>>>> one > > >>>>>> rebalance since all members are expected to rejoin within > timeout. > > >> If > > >>>>>> consumer crashes, the join group request from the restarted > > >> consumer > > >>>> will > > >>>>>> be recognized as an existing member and be handled as above > > >> condition > > >>>> 1; > > >>>>>> However, if the consumer takes longer than session timeout to > > >> return, > > >>>> we > > >>>>>> shall still trigger rebalance but it could still try to catch > > >>>>>> `change-group-timeout`. If it failed to catch second timeout, its > > >>>> cached > > >>>>>> state on broker will be garbage collected and trigger a new > > >> rebalance > > >>>>> when > > >>>>>> it finally joins. > > >>>>>> > > >>>>>> > > >>>>>> And consider the switch between dynamic to static membership. > > >>>>>> > > >>>>>> 1. Dynamic to static: the first joiner shall revise the > > >> membership > > >>>> to > > >>>>>> static and wait for all the current members to restart, since > their > > >>>>>> membership is still dynamic. Here our assumption is that the > > >> restart > > >>>>>> process shouldn't take a long time, as long restart is breaking > the > > >>>>>> `rebalance timeout` in whatever membership protocol we are using. > > >>>> Before > > >>>>>> restart, all dynamic member join requests will be rejected. > > >>>>>> 2. Static to dynamic: this is more like a downgrade which > should > > >>> be > > >>>>>> smooth: just erase the cached mapping, and wait for session > timeout > > >>> to > > >>>>>> trigger rebalance should be sufficient. (Fallback to current > > >>> behavior) > > >>>>>> 3. Halfway switch: a corner case is like some clients keep > > >> dynamic > > >>>>>> membership while some keep static membership. This will cause the > > >>> group > > >>>>>> rebalance forever without progress because dynamic/static states > > >> are > > >>>>>> bouncing each other. This could guarantee that we will not make > the > > >>>>>> consumer group work in a wrong state by having half static and > half > > >>>>> dynamic. > > >>>>>> > > >>>>>> To guarantee correctness, we will also push the member name/id > pair > > >>> to > > >>>>>> _consumed_offsets topic (as Matthias pointed out) and upgrade the > > >> API > > >>>>>> version, these details will be further discussed back in the KIP. > > >>>>>> > > >>>>>> > > >>>>>> Are there any concern for this high level proposal? Just want to > > >>>>> reiterate > > >>>>>> on the core idea of the KIP: "If the broker recognize this > consumer > > >>> as > > >>>> an > > >>>>>> existing member, it shouldn't trigger rebalance". > > >>>>>> > > >>>>>> Thanks a lot for everyone's input! I feel this proposal is much > > >> more > > >>>>>> robust than previous one! > > >>>>>> > > >>>>>> > > >>>>>> Best, > > >>>>>> > > >>>>>> Boyang > > >>>>>> > > >>>>>> ________________________________ > > >>>>>> From: Matthias J. Sax <matth...@confluent.io> > > >>>>>> Sent: Friday, August 10, 2018 2:24 AM > > >>>>>> To: dev@kafka.apache.org > > >>>>>> Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer > rebalances > > >>> by > > >>>>>> specifying member id > > >>>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> thanks for the detailed discussion. I learned a lot about > internals > > >>>> again > > >>>>>> :) > > >>>>>> > > >>>>>> I like the idea or a user config `member.name` and to keep ` > > >>> member.id` > > >>>>>> internal. Also agree with Guozhang, that reusing `client.id` > might > > >>> not > > >>>>>> be a good idea. > > >>>>>> > > >>>>>> To clarify the algorithm, each time we generate a new > `member.id`, > > >>> we > > >>>>>> also need to update the "group membership" information (ie, > mapping > > >>>>>> [member.id, Assignment]), right? Ie, the new `member.id` replaces > > >>> the > > >>>>>> old entry in the cache. > > >>>>>> > > >>>>>> I also think, we need to preserve the `member.name -> member.id` > > >>>> mapping > > >>>>>> in the `__consumer_offset` topic. The KIP should mention this > IMHO. > > >>>>>> > > >>>>>> For changing the default value of config `leave.group.on.close`. > I > > >>>> agree > > >>>>>> with John, that we should not change the default config, because > it > > >>>>>> would impact all consumer groups with dynamic assignment. > However, > > >> I > > >>>>>> think we can document, that if static assignment is used (ie, > > >>>>>> `member.name` is configured) we never send a LeaveGroupRequest > > >>>>>> regardless of the config. Note, that the config is internal, so > not > > >>>> sure > > >>>>>> how to document this in detail. We should not expose the internal > > >>>> config > > >>>>>> in the docs. > > >>>>>> > > >>>>>> About upgrading: why do we need have two rolling bounces and > encode > > >>>>>> "static" vs "dynamic" in the JoinGroupRequest? > > >>>>>> > > >>>>>> If we upgrade an existing consumer group from dynamic to static, > I > > >>>> don't > > >>>>>> see any reason why both should not work together and single > rolling > > >>>>>> bounce would not be sufficient? If we bounce the first consumer > and > > >>>>>> switch from dynamic to static, it sends a `member.name` and the > > >>> broker > > >>>>>> registers the [member.name, member.id] in the cache. Why would > > >> this > > >>>>>> interfere with all other consumer that use dynamic assignment? > > >>>>>> > > >>>>>> Also, Guozhang mentioned that for all other request, we need to > > >> check > > >>>> if > > >>>>>> the mapping [member.name, member.id] contains the send > `member.id` > > >>> -- > > >>>> I > > >>>>>> don't think this is necessary -- it seems to be sufficient to > check > > >>> the > > >>>>>> `member.id` from the [member.id, Assignment] mapping as be do > > >> today > > >>> -- > > >>>>>> thus, checking `member.id` does not require any change IMHO. > > >>>>>> > > >>>>>> > > >>>>>> -Matthias > > >>>>>> > > >>>>>> > > >>>>>>> On 8/7/18 7:13 PM, Guozhang Wang wrote: > > >>>>>>> @James > > >>>>>>> > > >>>>>>> What you described is true: the transition from dynamic to > static > > >>>>>>> memberships are not thought through yet. But I do not think it > is > > >>> an > > >>>>>>> impossible problem: note that we indeed moved the offset commit > > >>> from > > >>>> ZK > > >>>>>> to > > >>>>>>> kafka coordinator in 0.8.2 :) The migration plan is to first to > > >>>>>>> double-commits on both zk and coordinator, and then do a second > > >>> round > > >>>>> to > > >>>>>>> turn the zk off. > > >>>>>>> > > >>>>>>> So just to throw a wild idea here: also following a > > >>>> two-rolling-bounce > > >>>>>>> manner, in the JoinGroupRequest we can set the flag to "static" > > >>> while > > >>>>>> keep > > >>>>>>> the registry-id field empty still, in this case, the coordinator > > >>>> still > > >>>>>>> follows the logic of "dynamic", accepting the request while > > >>> allowing > > >>>>> the > > >>>>>>> protocol to be set to "static"; after the first rolling bounce, > > >> the > > >>>>> group > > >>>>>>> protocol is already "static", then a second rolling bounce is > > >>>> triggered > > >>>>>> and > > >>>>>>> this time we set the registry-id. > > >>>>>>> > > >>>>>>> > > >>>>>>> Guozhang > > >>>>>>> > > >>>>>>> On Tue, Aug 7, 2018 at 1:19 AM, James Cheng < > > >> wushuja...@gmail.com> > > >>>>>> wrote: > > >>>>>>> > > >>>>>>>> Guozhang, in a previous message, you proposed said this: > > >>>>>>>> > > >>>>>>>>> On Jul 30, 2018, at 3:56 PM, Guozhang Wang <wangg...@gmail.com > > >>> > > >>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> 1. We bump up the JoinGroupRequest with additional fields: > > >>>>>>>>> > > >>>>>>>>> 1.a) a flag indicating "static" or "dynamic" membership > > >>> protocols. > > >>>>>>>>> 1.b) with "static" membership, we also add the pre-defined > > >>> member > > >>>>> id. > > >>>>>>>>> 1.c) with "static" membership, we also add an optional > > >>>>>>>>> "group-change-timeout" value. > > >>>>>>>>> > > >>>>>>>>> 2. On the broker side, we enforce only one of the two > protocols > > >>> for > > >>>>> all > > >>>>>>>>> group members: we accept the protocol on the first joined > > >> member > > >>> of > > >>>>> the > > >>>>>>>>> group, and if later joining members indicate a different > > >>> membership > > >>>>>>>>> protocol, we reject it. If the group-change-timeout value was > > >>>>> different > > >>>>>>>> to > > >>>>>>>>> the first joined member, we reject it as well. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> What will happen if we have an already-deployed application > that > > >>>> wants > > >>>>>> to > > >>>>>>>> switch to using static membership? Let’s say there are 10 > > >>> instances > > >>>> of > > >>>>>> it. > > >>>>>>>> As the instances go through a rolling restart, they will switch > > >>> from > > >>>>>>>> dynamic membership (the default?) to static membership. As each > > >>> one > > >>>>>> leaves > > >>>>>>>> the group and restarts, they will be rejected from the group > > >>>> (because > > >>>>>> the > > >>>>>>>> group is currently using dynamic membership). The group will > > >>> shrink > > >>>>> down > > >>>>>>>> until there is 1 node handling all the traffic. After that one > > >>>>> restarts, > > >>>>>>>> the group will switch over to static membership. > > >>>>>>>> > > >>>>>>>> Is that right? That means that the transition plan from dynamic > > >> to > > >>>>>> static > > >>>>>>>> membership isn’t very smooth. > > >>>>>>>> > > >>>>>>>> I’m not really sure what can be done in this case. This reminds > > >> me > > >>>> of > > >>>>>> the > > >>>>>>>> transition plans that were discussed for moving from > > >>> zookeeper-based > > >>>>>>>> consumers to kafka-coordinator-based consumers. That was also > > >>> hard, > > >>>>> and > > >>>>>>>> ultimately we decided not to build that. > > >>>>>>>> > > >>>>>>>> -James > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> -- Guozhang > > >>>> > > >>> > > >> > > >> > > >> > > >> -- > > >> -- Guozhang > > >> > > > >