[ https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757854#comment-15757854 ]
ASF GitHub Bot commented on KAFKA-4553: --------------------------------------- GitHub user ewencp opened a pull request: https://github.com/apache/kafka/pull/2272 KAFKA-4553: Improve round robin assignment in Connect to avoid uneven distributions of connectors and tasks You can merge this pull request into a Git repository by running: $ git pull https://github.com/ewencp/kafka kafka-4553-better-connect-round-robin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2272.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2272 ---- commit a33bbec13aac54bf2e09869125d6efb89165f602 Author: Ewen Cheslack-Postava <m...@ewencp.org> Date: 2016-12-17T23:53:29Z KAFKA-4553: Improve round robin assignment in Connect to avoid uneven distributions of connectors and tasks ---- > Connect's round robin assignment produces undesirable distribution of > connectors/tasks > -------------------------------------------------------------------------------------- > > Key: KAFKA-4553 > URL: https://issues.apache.org/jira/browse/KAFKA-4553 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 0.10.1.0 > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > > Currently the round robin assignment in Connect looks something like this: > foreach connector { > assign connector to next worker > for each task in connector { > assign task to next member > } > } > For the most part we assume that connectors and tasks are effectively > equivalent units of work, but this is actually rarely the case. Connectors > are usually much lighterweight as they are just monitoring for changes in the > source/sink system and tasks are doing the heavy lifting. The way we are > currently doing round robin assignment then causes uneven distributions of > work in some cases that are not too uncommon. > In particular, it gets bad if there are an even number of workers and > connectors that generate only a single task since this results in the even > #'d workers always getting assigned connectors and odd workers always getting > assigned tasks. An extreme case of this is when users start distributed mode > clusters with just a couple of workers to get started and deploy multiple > single-task connectors (e.g. CDC connectors like Debezium would be a common > example). All the connectors end up on one worker, all the tasks end up on > the other, and the second worker becomes overloaded. > Although the ideal solution to this problem is to have a better idea of how > much load each connector/task will generate, I don't think we want to get > into the business of full-on cluster resource management. An alternative > which I think avoids this common pitfall without the risk of hitting another > common bad case is to change the algorithm to assign all the connectors > first, then all the tasks, i.e. > foreach connector { > assign connector to next worker > } > foreach connector { > for each task in connector { > assign task to next worker > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)