[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-22017: ----------------------------------- Labels: auto-deprioritized-critical (was: stale-critical) Priority: Major (was: Critical) This issue was labeled "stale-critical" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Critical, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Regions may never be scheduled when there are cross-region blocking edges > ------------------------------------------------------------------------- > > Key: FLINK-22017 > URL: https://issues.apache.org/jira/browse/FLINK-22017 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.11.3, 1.12.2, 1.13.0 > Reporter: Zhilong Hong > Priority: Major > Labels: auto-deprioritized-critical > Attachments: Illustration.jpg > > > For the topology with cross-region blocking edges, there are regions that may > never be scheduled. The case is illustrated in the figure below. > !Illustration.jpg! > Let's denote the vertices with layer_number. It's clear that the edge > connects v2_2 and v3_2 crosses region 1 and region 2. Since region 1 has no > blocking edges connected to other regions, it will be scheduled first. When > vertex2_2 is finished, PipelinedRegionSchedulingStrategy will trigger > {{onExecutionStateChange}} for it. > As expected, region 2 will be scheduled since all its consumed partitions are > consumable. But in fact region 2 won't be scheduled, because the result > partition of vertex2_2 is not tagged as consumable. Whether it is consumable > or not is determined by its IntermediateDataSet. > However, an IntermediateDataSet is consumable if and only if all the > producers of its IntermediateResultPartitions are finished. This > IntermediateDataSet will never be consumable since vertex2_3 is not > scheduled. All in all, this forms a deadlock that a region will never be > scheduled because it's not scheduled. > As a solution we should let BLOCKING result partitions be consumable > individually. Note that this will result in the scheduling to become > execution-vertex-wise instead of stage-wise, with a nice side effect towards > better resource utilization. The PipelinedRegionSchedulingStrategy can be > simplified along with change to get rid of the correlatedResultPartitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)