Ewen Cheslack-Postava created KAFKA-4553:
--------------------------------------------

             Summary: Connect's round robin assignment produces undesirable 
distribution of connectors/tasks
                 Key: KAFKA-4553
                 URL: https://issues.apache.org/jira/browse/KAFKA-4553
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
    Affects Versions: 0.10.1.0
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava


Currently the round robin assignment in Connect looks something like this:

foreach connector {
  assign connector to next worker
  for each task in connector {
    assign task to next member
  }
}

For the most part we assume that connectors and tasks are effectively 
equivalent units of work, but this is actually rarely the case. Connectors are 
usually much lighterweight as they are just monitoring for changes in the 
source/sink system and tasks are doing the heavy lifting. The way we are 
currently doing round robin assignment then causes uneven distributions of work 
in some cases that are not too uncommon.

In particular, it gets bad if there are an even number of workers and 
connectors that generate only a single task since this results in the even #'d 
workers always getting assigned connectors and odd workers always getting 
assigned tasks. An extreme case of this is when users start distributed mode 
clusters with just a couple of workers to get started and deploy multiple 
single-task connectors (e.g. CDC connectors like Debezium would be a common 
example). All the connectors end up on one worker, all the tasks end up on the 
other, and the second worker becomes overloaded.

Although the ideal solution to this problem is to have a better idea of how 
much load each connector/task will generate, I don't think we want to get into 
the business of full-on cluster resource management. An alternative which I 
think avoids this common pitfall without the risk of hitting another common bad 
case is to change the algorithm to assign all the connectors first, then all 
the tasks, i.e.

foreach connector {
  assign connector to next worker
}
foreach connector {
  for each task in connector {
    assign task to next worker
  }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to