[ https://issues.apache.org/jira/browse/FLINK-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann reassigned FLINK-7851: ------------------------------------ Assignee: Till Rohrmann > Improve scheduling balance in case of fewer sub tasks than input operator > ------------------------------------------------------------------------- > > Key: FLINK-7851 > URL: https://issues.apache.org/jira/browse/FLINK-7851 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination > Affects Versions: 1.4.0, 1.3.2 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Fix For: 1.4.0 > > > When having a job where we have a mapper {{m1}} running with dop {{n}} > followed by a key by and a mapper {{m2}} (all-to-all communication) which > runs with dop {{m}} and {{n > m}}, it happens that the sub tasks of {{m2}} > are not uniformly spread out across all currently used {{TaskManagers}}. > For example: {{n = 4}}, {{m = 2}} and we have 2 TaskManagers with 2 slots > each. The deployment would look the following: > TM1: > Slot 1: {{m1_1}} -> {{m_2_1}} > Slot 2: {{m1_3}} -> {{m_2_2}} > TM2: > Slot 1: {{m1_2}} > Slot 2: {{m1_4}} > The problem for this behaviour is that when there are too many preferred > locations (currently 8) due to an all-to-all communication pattern, then we > will simply poll the next slot from the MultiMap in > {{SlotSharingGroupAssignment}}. The polling algorithm first drains all > available slots for a single machine before it polls slots from another > machine. > I think it would be better to poll slots in a round robin fashion wrt to the > machines. That way we would get a better resource utilisation by spreading > the tasks more evenly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)