[
https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson updated KAFKA-4553:
-----------------------------------
Resolution: Fixed
Fix Version/s: 0.10.2.0
Status: Resolved (was: Patch Available)
Issue resolved by pull request 2272
[https://github.com/apache/kafka/pull/2272]
> Connect's round robin assignment produces undesirable distribution of
> connectors/tasks
> --------------------------------------------------------------------------------------
>
> Key: KAFKA-4553
> URL: https://issues.apache.org/jira/browse/KAFKA-4553
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 0.10.1.0
> Reporter: Ewen Cheslack-Postava
> Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.2.0
>
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
> assign connector to next worker
> for each task in connector {
> assign task to next member
> }
> }
> For the most part we assume that connectors and tasks are effectively
> equivalent units of work, but this is actually rarely the case. Connectors
> are usually much lighterweight as they are just monitoring for changes in the
> source/sink system and tasks are doing the heavy lifting. The way we are
> currently doing round robin assignment then causes uneven distributions of
> work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and
> connectors that generate only a single task since this results in the even
> #'d workers always getting assigned connectors and odd workers always getting
> assigned tasks. An extreme case of this is when users start distributed mode
> clusters with just a couple of workers to get started and deploy multiple
> single-task connectors (e.g. CDC connectors like Debezium would be a common
> example). All the connectors end up on one worker, all the tasks end up on
> the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how
> much load each connector/task will generate, I don't think we want to get
> into the business of full-on cluster resource management. An alternative
> which I think avoids this common pitfall without the risk of hitting another
> common bad case is to change the algorithm to assign all the connectors
> first, then all the tasks, i.e.
> foreach connector {
> assign connector to next worker
> }
> foreach connector {
> for each task in connector {
> assign task to next worker
> }
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)