[ 
https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757854#comment-15757854
 ] 

ASF GitHub Bot commented on KAFKA-4553:
---------------------------------------

GitHub user ewencp opened a pull request:

    https://github.com/apache/kafka/pull/2272

    KAFKA-4553: Improve round robin assignment in Connect to avoid uneven 
distributions of connectors and tasks

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ewencp/kafka 
kafka-4553-better-connect-round-robin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/2272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2272
    
----
commit a33bbec13aac54bf2e09869125d6efb89165f602
Author: Ewen Cheslack-Postava <m...@ewencp.org>
Date:   2016-12-17T23:53:29Z

    KAFKA-4553: Improve round robin assignment in Connect to avoid uneven 
distributions of connectors and tasks

----


> Connect's round robin assignment produces undesirable distribution of 
> connectors/tasks
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4553
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4553
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 0.10.1.0
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
>   assign connector to next worker
>   for each task in connector {
>     assign task to next member
>   }
> }
> For the most part we assume that connectors and tasks are effectively 
> equivalent units of work, but this is actually rarely the case. Connectors 
> are usually much lighterweight as they are just monitoring for changes in the 
> source/sink system and tasks are doing the heavy lifting. The way we are 
> currently doing round robin assignment then causes uneven distributions of 
> work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and 
> connectors that generate only a single task since this results in the even 
> #'d workers always getting assigned connectors and odd workers always getting 
> assigned tasks. An extreme case of this is when users start distributed mode 
> clusters with just a couple of workers to get started and deploy multiple 
> single-task connectors (e.g. CDC connectors like Debezium would be a common 
> example). All the connectors end up on one worker, all the tasks end up on 
> the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how 
> much load each connector/task will generate, I don't think we want to get 
> into the business of full-on cluster resource management. An alternative 
> which I think avoids this common pitfall without the risk of hitting another 
> common bad case is to change the algorithm to assign all the connectors 
> first, then all the tasks, i.e.
> foreach connector {
>   assign connector to next worker
> }
> foreach connector {
>   for each task in connector {
>     assign task to next worker
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to