[ https://issues.apache.org/jira/browse/KAFKA-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewen Cheslack-Postava updated KAFKA-4553: ----------------------------------------- Status: Patch Available (was: Open) > Connect's round robin assignment produces undesirable distribution of > connectors/tasks > -------------------------------------------------------------------------------------- > > Key: KAFKA-4553 > URL: https://issues.apache.org/jira/browse/KAFKA-4553 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 0.10.1.0 > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > > Currently the round robin assignment in Connect looks something like this: > foreach connector { > assign connector to next worker > for each task in connector { > assign task to next member > } > } > For the most part we assume that connectors and tasks are effectively > equivalent units of work, but this is actually rarely the case. Connectors > are usually much lighterweight as they are just monitoring for changes in the > source/sink system and tasks are doing the heavy lifting. The way we are > currently doing round robin assignment then causes uneven distributions of > work in some cases that are not too uncommon. > In particular, it gets bad if there are an even number of workers and > connectors that generate only a single task since this results in the even > #'d workers always getting assigned connectors and odd workers always getting > assigned tasks. An extreme case of this is when users start distributed mode > clusters with just a couple of workers to get started and deploy multiple > single-task connectors (e.g. CDC connectors like Debezium would be a common > example). All the connectors end up on one worker, all the tasks end up on > the other, and the second worker becomes overloaded. > Although the ideal solution to this problem is to have a better idea of how > much load each connector/task will generate, I don't think we want to get > into the business of full-on cluster resource management. An alternative > which I think avoids this common pitfall without the risk of hitting another > common bad case is to change the algorithm to assign all the connectors > first, then all the tasks, i.e. > foreach connector { > assign connector to next worker > } > foreach connector { > for each task in connector { > assign task to next worker > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)