
Flink Jira Bot updated FLINK-22017:
      Labels: auto-deprioritized-critical  (was: stale-critical)
    Priority: Major  (was: Critical)

This issue was labeled "stale-critical" 7 ago and has not received any updates 
so it is being deprioritized. If this ticket is actually Critical, please raise 
the priority and ask a committer to assign you the issue or revive the public 

> Regions may never be scheduled when there are cross-region blocking edges
> -------------------------------------------------------------------------
>                 Key: FLINK-22017
>                 URL: https://issues.apache.org/jira/browse/FLINK-22017
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.3, 1.12.2, 1.13.0
>            Reporter: Zhilong Hong
>            Priority: Major
>              Labels: auto-deprioritized-critical
>         Attachments: Illustration.jpg
> For the topology with cross-region blocking edges, there are regions that may 
> never be scheduled. The case is illustrated in the figure below.
> !Illustration.jpg!
> Let's denote the vertices with layer_number. It's clear that the edge 
> connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no 
> blocking edges connected to other regions, it will be scheduled first. When 
> vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger 
> {{onExecutionStateChange}} for it.
> As expected, region 2 will be scheduled since all its consumed partitions are 
> consumable. But in fact region 2 won't be scheduled, because the result 
> partition of vertex2_2 is not tagged as consumable. Whether it is consumable 
> or not is determined by its IntermediateDataSet.
> However, an IntermediateDataSet is consumable if and only if all the 
> producers of its IntermediateResultPartitions are finished. This 
> IntermediateDataSet will never be consumable since vertex2_3 is not 
> scheduled. All in all, this forms a deadlock that a region will never be 
> scheduled because it's not scheduled.
> As a solution we should let BLOCKING result partitions be consumable 
> individually. Note that this will result in the scheduling to become 
> execution-vertex-wise instead of stage-wise, with a nice side effect towards 
> better resource utilization. The PipelinedRegionSchedulingStrategy can be 
> simplified along with change to get rid of the correlatedResultPartitions.

This message was sent by Atlassian Jira

Reply via email to