Hi Jason, I appreciate your feedback. Please see my comments below, and advise if you have further suggestions. Thanks. Regards, --Vahid
From: Jason Gustafson <ja...@confluent.io> To: dev@kafka.apache.org Date: 06/22/2016 04:41 PM Subject: Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy Hey Vahid, Thanks for the updates. I think the lack of comments on this KIP suggests that the motivation might need a little work. Here are the two main benefits of this assignor as I see them: 1. It can give a more balanced assignment when subscriptions do not match in a group (this is the same problem solved by KIP-49). 2. It potentially allows applications to save the need to cleanup partition state when rebalancing since partitions are more likely to stay assigned to the same consumer. Does that seem right to you? Yes, it does. Your summarized it nicely. #1 is an advantage of this strategy compared to existing round robin and fair strategies. I think it's unclear how serious the first problem is. Providing better balance when subscriptions differ is nice, but are rolling updates the only scenario where this is encountered? Or are there more general use cases where differing subscriptions could persist for a longer duration? I'm also wondering if this assignor addresses the problem found in KAFKA-2019. It would be useful to confirm whether this problem still exists with the new consumer's round robin strategy and how (whether?) it is addressed by this assignor. I'm not very clear on the first part of this paragraph. You could clarify it for me, but in general balancing out the partitions across consumers in a group as much as possible would normally mean balancing the load within the cluster, and that's something a user would want to have compared to cases where the assignments and therefore the load could be quite unbalanced depending on the subscriptions. Having an optimal balance is definitely more reassuring that knowing partition assignments could get quite unbalanced. There is an example in the KIP that explains a simple use case that leads to an unbalanced assignment with round robin assignment. This unbalance could become much more severe in real use cases with many more topics / partitions / consumers, and that's ideally something we would want to avoid, if possible. Regarding KAFKA-2019, when I try the simple use case of https://issues.apache.org/jira/browse/KAFKA-2019?focusedCommentId=14360892&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14360892 each of my consumers gets 3 partitions, which is not the same as what is mentioned in the comment. I might be missing something in the configuration (except setting the strategy to 'roundrobin', and fetcher threads to '2') or the issue may have been resolved already by some other patch. In any case, the issue based on what I read in the JIRA stems from multiple threads that each consumer may have and how they threads of each consumer are assigned first before assigning partitions to other consumer threads. Since the new consumer is single threaded there is no such problem in its round robin strategy. It simply considers consumers one by one for each partition assignment, and when one consumer is assigned a partition, the next assignment starts with considering the next consumer in the list (and not the same consumer that was just assigned). This removes the possibility of the issue reported in KAFKA-2019 surfacing in the new consumer. In the sticky strategy we do not have this issue either, since every time an assignment is about to happen we start with the consumer with least number of assignments. So we will not have a scenario where a consumer is repeated assigned partitions as in KAFKA-2019 (unless that consumer is lagging behind other consumers on the number of partitions assigned). The major selling point seems to be the second point. This is definitely nice to have, but would you expect a lot of value in practice since consumer groups are usually assumed to be stable? It might help to describe some specific use cases to help motivate the proposal. One of the downsides is that it requires users to restructure their code to get any benefit from it. In particular, they need to move partition cleanup out of the onPartitionsRevoked() callback and into onPartitionsAssigned(). This is a little awkward and will probably make explaining the consumer more difficult. It's probably worth including a discussion of this point in the proposal with an example. Even though consumer groups are usually stable, it might be the case that consumers do not initially join the group at the same time. The sticky strategy in that situation lets those who joined earlier stick to their partitions to some extent (assuming fairness take precedence over stickiness). In terms of specific use cases, Andrew touched on examples of how Kafka can benefit from a sticky assignor. I could add those to the KIP if you also think they help building the case in favor of sticky assignor. I agree with you about the downside and I'll make sure I add that to the KIP as you suggested. Thanks, Jason On Tue, Jun 7, 2016 at 4:05 PM, Vahid S Hashemian <vahidhashem...@us.ibm.com > wrote: > Hi Jason, > > I updated the KIP and added some details about the user data, the > assignment algorithm, and the alternative strategies to consider. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-54+-+Sticky+Partition+Assignment+Strategy > > Please let me know if I missed to add something. Thank you. > > Regards, > --Vahid > > >