Jason, Thanks for the writeup. A few comments below.
1. When there are multiple common protocols in the JoinGroupRequest, which one would the coordinator pick? 2. If the protocols don't agree, the group construction fails. What exactly does it mean? Do we send an error in every JoinGroupResponse and remove all members in the group in the coordinator? 3. Consumer embedded protocol: The proposal has two different formats of subscription depending on whether wildcards are used or not. This seems a bit complicated. Would it be better to always use the metadata hash? The clients know the subscribed topics already. This way, the client code behaves the same whether wildcards are used or not. Jiangjie, With respect to rebalance churns due to topics being created/deleted. With the new consumer, the rebalance can probably settle within 200ms when there is a topic change. So, as long as we are not changing topic more than 5 times per sec, there shouldn't be constant churns, right? Thanks, Jun On Tue, Aug 11, 2015 at 1:19 PM, Jason Gustafson <ja...@confluent.io> wrote: > Hi Kafka Devs, > > One of the nagging issues in the current design of the new consumer has > been the need to support a variety of assignment strategies. We've > encountered this in particular in the design of copycat and the processing > framework (KIP-28). From what I understand, Samza also has a number of use > cases with custom assignment needs. The new consumer protocol supports new > assignment strategies by hooking them into the broker. For many > environments, this is a major pain and in some cases, a non-starter. It > also challenges the validation that the coordinator can provide. For > example, some assignment strategies call for partitions to be assigned > multiple times, which means that the coordinator can only check that > partitions have been assigned at least once. > > To solve these issues, we'd like to propose moving assignment to the > client. I've written a wiki which outlines some protocol changes to achieve > this: > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal > . > To summarize briefly, instead of the coordinator assigning the partitions > itself, all subscriptions are forwarded to each member of the group which > then decides independently which partitions it should consume. The protocol > provides a mechanism for the coordinator to validate that all consumers use > the same assignment strategy, but it does not ensure that the resulting > assignment is "correct." This provides a powerful capability for users to > control the full data flow on the client side. They control how data is > written to partitions through the Partitioner interface and they control > how data is consumed through the assignment strategy, all without touching > the server. > > Of course nothing comes for free. In particular, this change removes the > ability of the coordinator to validate that commits are made by consumers > who were assigned the respective partition. This might not be too bad since > we retain the ability to validate the generation id, but it is a potential > concern. We have considered alternative protocols which add a second > round-trip to the protocol in order to give the coordinator the ability to > confirm the assignment. As mentioned above, the coordinator is somewhat > limited in what it can actually validate, but this would return its ability > to validate commits. The tradeoff is that it increases the protocol's > complexity which means more ways for the protocol to fail and consequently > more edge cases in the code. > > It also misses an opportunity to generalize the group membership protocol > for additional use cases. In fact, after you've gone to the trouble of > moving assignment to the client, the main thing that is left in this > protocol is basically a general group management capability. This is > exactly what is needed for a few cases that are currently under discussion > (e.g. copycat or single-writer producer). We've taken this further step in > the proposal and attempted to envision what that general protocol might > look like and how it could be used both by the consumer and for some of > these other cases. > > Anyway, since time is running out on the new consumer, we have perhaps one > last chance to consider a significant change in the protocol like this, so > have a look at the wiki and share your thoughts. I've no doubt that some > ideas seem clearer in my mind than they do on paper, so ask questions if > there is any confusion. > > Thanks! > Jason >