loyi created FLINK-23190: ---------------------------- Summary: Make task-slot allocation much more evenly Key: FLINK-23190 URL: https://issues.apache.org/jira/browse/FLINK-23190 Project: Flink Issue Type: Improvement Components: Runtime / Task Affects Versions: 1.12.3 Reporter: loyi
Description: FLINK-12122 only guarantees spreading out tasks across the set of TMs which are registered at the time of scheduling, but our jobs are all runing on active yarn mode, the job with smaller source parallelism offen cause load-balance issues. For this job: {code:java} // -ys 4 means 10 taskmanagers env.addSource(...).name("A").setParallelism(10). map(...).name("B").setParallelism(30) .map(...).name("C").setParallelism(40) .addSink(...).name("D").setParallelism(20); {code} released-1.12.3 allocation: ||operator||tm1 ||tm2||tm3||tm4||tm5||5m6||tm7||tm8||tm9||tm10|| |A| 1|{color:#de350b}2{color}|{color:#de350b}2{color}|1|1|{color:#de350b}3{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}| |B|3|3|3|3|3|3|3|3|{color:#de350b}2{color}|{color:#de350b}4{color}| |C|4|4|4|4|4|4|4|4|4|4| |D|2|2|2|2|2|{color:#de350b}1{color}|{color:#de350b}1{color}|2|2|{color:#de350b}4{color}| Suggestions: When TM register slots to slotManager , we could group the pendingRequests by their "ExecutionVertexGroup" , then allocate the slots proportionally to each group. I have implement a concept version based on release-1.12.3 , the job have fully evenly task allocation . I want to know if there are other point that have not been considered ? -- This message was sent by Atlassian Jira (v8.3.4#803005)