showuon commented on pull request #11451: URL: https://github.com/apache/kafka/pull/11451#issuecomment-964156875
@guozhangwang , when investigating the broken tests, I found my change will cause the "normal rebalance" slower. Here's why: Before my change, the rebalance with 2 consumers will be like this: 1. consumer A joined group G, and joinGroup completed 2. consumer B joined group G, the group G state change to `preparingRebalance` 3. consumer A send syncGroup, and got `REBALANCE_IN_PROGRESS` error 4. consumer A rejoin group, send joinGroup to group G 5. consumer B send joinGroup to group G 6. consumer A and B complete the joinGroup and syncGroup successfully It looks great. But after my change in this PR, it'll become (the change is highlighted in **bold**) 1. consumer A joined group G, and joinGroup completed 2. consumer B joined group G, the group G state change to `preparingRebalance` 3. consumer A send syncGroup, and got `REBALANCE_IN_PROGRESS` error **4. consumer A reset generation and state** **5. consumer A rejoin group, send joinGroup with a new member ID to group G** 6. consumer B send joinGroup to group G **7. waiting until rebalance timeout to kick out the old consumer A(old member ID)** 8. consumer A and B complete the joinGroup and syncGroup successfully That's why this change causes the rebalance slower. We can explicitly leave group when sync group with `REBALANCE_IN_PROGRESS`, but I think we have to fix it from the root! === Currently, all the issue we faced (i.e. KAFKA-12984, KAFKA-13406), is due to the `ownedPartitions` data in the subscription message is out-of-date, and we don't have a good way to identify it. So in KAFKA-12984, we have to put `generation` info into `userData` in Subscription message in `CooperativeStickyAssignor`. (And we have no way to workaround for custom cooperative assignors). But then, we found we forgot the cooperative assignment validation will also use the `ownedPartitions` data in KAFKA-13406, so we have to workaround it again. Therefore, I think we should add an additional field `generation` into Subscription message, to help `CooperativeStickyAssignor` and custom assignors leverage `ownedPartitions + generation` to do "correct" assignment. [KIP-792](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336614) is drafted. I'd like to get your feedback before I send it to dev group discussion. Thank you. So, back to your original comment about 2 places to fix: 1. Inside the `ConsumerCoordinator#subscriptions`, where we save the currently assigned partitions. 2. Inside the `Assignor#userData`, where for (cooperative) sticky assignor where we also encode the generation and the prev-assigned partitions as memberAssignment (note that in join-group request we do not encode the generation id). Well, KIP-792 is still focusing on 1) above. For 2), I've thought about it for some days, and I think we can ignore it, because in stickyAssignor (not cooperative one), we put both `ownedPartitions` and `generation` info into `userData`, which means, even the ownedPartition is out-of-date, we can still identify it. For custom assignors with old bytecode, I think they can use the same way to achieve the same goal. WDYT? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org