Hi Luke, Thanks for the explanation.
I don't see any description of how the broker decides to use the new version of ConsumerProtocolSubscription or not. This probably needs to be locked to a new IBP version. One scenario that we need to consider is what happens during a rolling upgrade. If the coordinator moves back and forth between brokers with different IBPs, it seems that the same epoch numbers could be reused for a group, if things are done in the obvious manner (old IBP = don't read or write epoch, new IBP = do). best, Colin On Fri, Dec 3, 2021, at 18:46, Luke Chen wrote: > Hi Colin, > Thanks for your comment. > >> How are we going to avoid the situation where the broker restarts, and > the same generation number is reused? > > Actually, this KIP doesn't have anything to do with the brokers. The > "generation" field I added, is in the subscription metadata, which will not > be deserialized by brokers. The metadata is only deserialized by consumer > lead. And for the consumer lead, the only thing the lead cared about, is > the highest generation of the ownedPartitions among all the consumers. With > the highest generation of the ownedPartitions, the consumer lead can > distribute the partitions as sticky as possible, and most importantly, > without errors. > > That is, after this KIP, if the broker restarts, and the same generation > number is reused, it won't break current rebalance behavior. But it'll help > the consumer lead do the sticky assignments correctly. > > Thank you. > Luke > > On Fri, Dec 3, 2021 at 6:30 AM Colin McCabe <co...@cmccabe.xyz> wrote: > >> How are we going to avoid the situation where the broker restarts, and the >> same generation number is reused? >> >> best, >> Colin >> >> On Tue, Nov 30, 2021, at 16:36, Luke Chen wrote: >> > Hi all, >> > >> > I'd like to start the vote for KIP-792: Add "generation" field into >> > consumer protocol. >> > >> > The goal of this KIP is to allow the assignor/consumer coordinator to >> have >> > a way to identify the out-of-date members/assignments, to avoid rebalance >> > stuck issues in current protocol. >> > >> > Detailed description can be found here: >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336614 >> > >> > Any feedback is welcome. >> > >> > Thank you. >> > Luke >>