[jira] [Updated] (KAFKA-4553) Connect's round robin assignment produces undesirable distribution of connectors/tasks

Ewen Cheslack-Postava (JIRA) Sat, 17 Dec 2016 17:37:40 -0800

     [ 
https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ewen Cheslack-Postava updated KAFKA-4553:
-----------------------------------------
    Status: Patch Available  (was: Open)

> Connect's round robin assignment produces undesirable distribution of 
> connectors/tasks
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4553
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4553
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 0.10.1.0
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>
> Currently the round robin assignment in Connect looks something like this:
> foreach connector {
>   assign connector to next worker
>   for each task in connector {
>     assign task to next member
>   }
> }
> For the most part we assume that connectors and tasks are effectively 
> equivalent units of work, but this is actually rarely the case. Connectors 
> are usually much lighterweight as they are just monitoring for changes in the 
> source/sink system and tasks are doing the heavy lifting. The way we are 
> currently doing round robin assignment then causes uneven distributions of 
> work in some cases that are not too uncommon.
> In particular, it gets bad if there are an even number of workers and 
> connectors that generate only a single task since this results in the even 
> #'d workers always getting assigned connectors and odd workers always getting 
> assigned tasks. An extreme case of this is when users start distributed mode 
> clusters with just a couple of workers to get started and deploy multiple 
> single-task connectors (e.g. CDC connectors like Debezium would be a common 
> example). All the connectors end up on one worker, all the tasks end up on 
> the other, and the second worker becomes overloaded.
> Although the ideal solution to this problem is to have a better idea of how 
> much load each connector/task will generate, I don't think we want to get 
> into the business of full-on cluster resource management. An alternative 
> which I think avoids this common pitfall without the risk of hitting another 
> common bad case is to change the algorithm to assign all the connectors 
> first, then all the tasks, i.e.
> foreach connector {
>   assign connector to next worker
> }
> foreach connector {
>   for each task in connector {
>     assign task to next worker
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4553) Connect's round robin assignment produces undesirable distribution of connectors/tasks

Reply via email to