Hey Boyang, Thanks for the proposal! This is very useful. I have some comments below:
1) The motivation currently explicitly states that the goal is to improve performance for heavy state application. It seems that the motivation can be stronger with the following use-case. Currently for MirrorMaker cluster with e.g. 100 MirrorMaker processes, it will take a long time to rolling bounce the entire MirrorMaker cluster. Each MirrorMaker process restart will trigger a rebalance which currently pause the consumption of the all partitions of the MirrorMaker cluster. With the change stated in this patch, as long as a MirrorMaker can restart within the specified timeout (e.g. 2 minutes), then we only need constant number of rebalance (e.g. for leader restart) for the entire rolling bounce, which will significantly improves the availability of the MirrorMaker pipeline. In my opinion, the main benefit of the KIP is to avoid unnecessary rebalance if the consumer process can be restarted within soon, which helps performance even if overhead of state shuffling for a given process is small. 2) In order to simplify the KIP reading, can you follow the writeup style of other KIP (e.g. KIP-98) and list the interface change such as new configs (e.g. registration timeout), new request/response, new AdminClient API and new error code (e.g. DUPLICATE_STATIC_MEMBER)? Currently some of these are specified in the Proposed Change section which makes it a bit inconvenient to understand the new interface that will be exposed to user. Explanation of the current two-phase rebalance protocol probably can be moved out of public interface section. 3) There are currently two version of JoinGroupRequest in the KIP and only one of them has field memberId. This seems confusing. 4) It is mentioned in the KIP that "An admin API to force rebalance could be helpful here, but we will make a call once we finished the major implementation". So this seems to be still an open question in the current design. We probably want to agree on this before voting for the KIP. 5) The KIP currently adds new config MEMBER_NAME for consumer. Can you specify the name of the config key and the default config value? Possible default values include empty string or null (similar to transaction.id in producer config). 6) Regarding the use of the topic "static_member_map" to persist member name map, currently if consumer coordinator broker goes offline, rebalance is triggered and consumers will try connect to the new coordinator. If these consumers can connect to the new coordinator within max.poll.interval.ms which by default is 5 minutes, given that broker can use a deterministic algorithm to determine the partition -> member_name mapping, each consumer should get assigned the same set of partitions without requiring state shuffling. So it is not clear whether we have a strong use-case for this new logic. Can you help clarify what is the benefit of using topic "static_member_map" to persist member name map? 7) Regarding the introduction of the expensionTimeoutMs config, it is mentioned that "we are using expansion timeout to replace rebalance timeout, which is configured by max.poll.intervals from client side, and using registration timeout to replace session timeout". Currently the default max.poll.interval.ms is configured to be 5 minutes and there will be only one rebalance if all new consumers can join within 5 minutes. So it is not clear whether we have a strong use-case for this new config. Can you explain what is the benefit of introducing this new config? 8) It is mentioned that "To distinguish between previous version of protocol, we will also increase the join group request version to v4 when MEMBER_NAME is set" and "If the broker version is not the latest (< v4), the join group request shall be downgraded to v3 without setting the member Id". It is probably simpler to just say that this feature is enabled if JoinGroupRequest V4 is supported on both client and broker and MEMBER_NAME is configured with non-empty string. 9) It is mentioned that broker may return NO_STATIC_MEMBER_INFO_SET error in OffsetCommitResponse for "commit requests under static membership". Can you clarify how broker determines whether the commit request is under static membership? Thanks, Dong