[ 
https://issues.apache.org/jira/browse/KAFKA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361672#comment-14361672
 ] 

Jiangjie Qin commented on KAFKA-2019:
-------------------------------------

Talked with Joel and my second thought on this is it might be better to keep 
the current code. The reasons are:
1. Current code only shows unbalance when partition number are small and each 
consumer has large number of consumer threads. This does not seem to be a very 
common or reasonable situation. i.e. If we don't have many partitions, why 
should we have so many consumer threads? Take a step back, even if in this case 
consumer are not balanced, since they only consume from small amount of 
partitions, performance is unlikely an issue.
2. Current code although has unbalance in situation 1), the extent of unbalance 
is bounded. i.e. a consumer can at most have number-of-consumer-threads more 
partition than any other consumer. But if we switch to the proposed approach in 
this ticket, then the severity of unbalance is not deterministic, especially 
when we only have small amount of partition, it is possible to be even worse 
than current code.
That said, plus Joel has a very good point on the backward compatibility. It is 
probably better to keep the current code.

> RoundRobinAssignor clusters by consumer
> ---------------------------------------
>
>                 Key: KAFKA-2019
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2019
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>            Reporter: Joseph Holsten
>            Assignee: Neha Narkhede
>            Priority: Minor
>         Attachments: 0001-sort-consumer-thread-ids-by-hashcode.patch, 
> KAFKA-2019.patch
>
>
> When rolling out a change today, I noticed that some of my consumers are 
> "greedy", taking far more partitions than others.
> The cause is that the RoundRobinAssignor is using a list of ConsumerThreadIds 
> sorted by toString, which is {{ "%s-%d".format(consumer, threadId)}}. This 
> causes each consumer's threads to be adjacent to each other.
> One possible fix would be to define ConsumerThreadId.hashCode, and sort by 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to