Re:[DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Yuepeng Pan Thu, 19 Oct 2023 19:52:37 -0700

Hi, dev.

After reviewing the entire email discussion thread with Rui, I noticed that my 
previous ambiguous understanding led to a few incorrect conclusions. 

So I need to change the corresponding conclusions. And Thanks for the help from 
Rui.

>For David: 

>The problem you're trying to solve only exists in complex graphs with

>different per-vertex parallelism. If the parallelism is set globally

>(assuming the pipeline has roughly even data skew), the algorithm could

>make things slightly worse by eliminating some local exchanges. Is that

>correct?

I re-checked that if all parallelisms of all nodes are equal, the new strategy 
will not disrupt local exchanges, all subtasks with forward shuffle are still 
in the same Slot.

As described in the 2.1.1 core logic of FLIP-370[1],  If all parallelisms of 
all nodes are equal, The new strategy would traverse all SEVs of JV, assign the 
SEVs[subtask_index] to the ESSGs[subtask_index]. As the result of the new 
strategy:

a.  This strategy ensures that SEVs with the same index can be assigned to the 
same ESSG. 

b. In the case of forward edges, all subtasks with forward shuffle are still in 
the same Slot, and they are local data exchanges.  

--------------------------------------------------

If there are no additional comments about the FLIP, I’d  plan to initiate a 
vote about the FLIP next Monday.

Best Regards,

Yuepeng

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling

At 2023-09-25 16:25:03, "Yuepeng Pan" <panyuep...@apache.org> wrote:
>Hi all,
>
>
>
>
>I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks 
>scheduling.
>
>
>
>
>The current strategy of Flink to deploy tasks sometimes leads some 
>TMs(TaskManagers) to have more tasks while others have fewer tasks, resulting 
>in excessive resource utilization at some TMs that contain more tasks and 
>becoming a bottleneck for the entire job processing. Developing strategies to 
>achieve task load balancing for TMs and reducing job bottlenecks becomes very 
>meaningful.
>
>
>
>
>The raw design and discussions could be found in the Flink JIRA[2] and Google 
>doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some valuable help 
>and suggestions in advance. 
>
>
>
>
>Please refer to the FLIP[1] document for more details about the proposed 
>design and implementation. We welcome any feedback and opinions on this 
>proposal.
>
>
>
>
>[1] 
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
>
>[2] https://issues.apache.org/jira/browse/FLINK-31757
>
>[3] 
>https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
>
>
>
>
>Best,
>
>Yuepeng Pan

Re:[DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Reply via email to