[ 
https://issues.apache.org/jira/browse/KAFKA-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108726#comment-17108726
 ] 

Sophie Blee-Goldman commented on KAFKA-9987:
--------------------------------------------

[~hai_lin] We can definitely backport this to 2.4 since it's all internal 
changes. Not sure if or when the next 2.4.x release will be, but I'll aiming to 
get the PR cleaned up and ready for review sometime next week. I definitely 
plan to run some scale benchmarks.

If you're interested in the current algorithm, the original 
([KIP-54|https://cwiki.apache.org/confluence/display/KAFKA/KIP-54+-+Sticky+Partition+Assignment+Strategy])
 document gives a high-level view of the algorithm. But basically it just 
starts by reassigning all partitions to their previous owner, then keeps making 
a pass over the partitions trying to move them between consumers until the 
assignment is balanced. It has to keep track of a lot of different state in 
various data structures, which might be where some of the inefficiency comes 
from.

The main goal of this algorithm is to leverage the constraint that all 
consumers are subscribed to the same set of topics to assert that every 
consumer should have exactly C_f or C_c partitions in the balanced assignment. 
So, like you said, we just aim to fill everyone up to their known capacity in a 
few passes and with minimal additional state being tracked.

 

> Improve sticky partition assignor algorithm
> -------------------------------------------
>
>                 Key: KAFKA-9987
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9987
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>            Reporter: Sophie Blee-Goldman
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>
> In 
> [KIP-429|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol]
>  we added the new CooperativeStickyAssignor which leverages on the underlying 
> sticky assignment algorithm of the existing StickyAssignor (moved to 
> AbstractStickyAssignor). The algorithm is fairly complex as it tries to 
> optimize stickiness while satisfying perfect balance _in the case individual 
> consumers may be subscribed to different subsets of the topics._ While it 
> does a pretty good job at what it promises to do, it doesn't scale well with 
> large numbers of consumers and partitions.
> To give a concrete example, users have reported that it takes 2.5 minutes for 
> the assignment to complete with just 2100 consumers reading from 2100 
> partitions. Since partitions revoked during the first of two cooperative 
> rebalances will remain unassigned until the end of the second rebalance, it's 
> important for the rebalance to be as fast as possible. And since one of the 
> primary improvements of the cooperative rebalancing protocol is better 
> scaling experience, the only OOTB cooperative assignor should not itself 
> scale poorly
> If we can constrain the problem a bit, we can simplify the algorithm greatly. 
> In many cases the individual consumers won't be subscribed to some random 
> subset of the total subscription, they will all be subscribed to the same set 
> of topics and rely on the assignor to balance the partition workload.
> We can detect this case by checking the group's individual subscriptions and 
> call on a more efficient assignment algorithm. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to