[ https://issues.apache.org/jira/browse/FLINK-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226698#comment-17226698 ]
Andrey Zagrebin edited comment on FLINK-19994 at 11/5/20, 12:49 PM: -------------------------------------------------------------------- This makes sense to me. We already have computeStronglyConnectedComponents to put vertexes of one iteration into one region. I am running [CI|https://dev.azure.com/azagrebin/azagrebin/_build/results?buildId=340&view=results] to see if there are any failures for existing iteration tests. We probably also need some test with failover for iterations if there is none yet. [~zhuzh] was buildOneRegionForAllVertices done because it was not clear whether one iteration can contain blocking edges or not? or what was the possible reason for head/tail being not in the same pipelined failover region? was (Author: azagrebin): This makes sense to me. We already have computeStronglyConnectedComponents to put vertexes of one iteration into one region. [~zhuzh] was buildOneRegionForAllVertices done because it was not clear whether one iteration can contain blocking edges or not? or what was the possible reason for head/tail being not in the same pipelined failover region? > All vertices in an DataSet iteration job will be eagerly scheduled > ------------------------------------------------------------------ > > Key: FLINK-19994 > URL: https://issues.apache.org/jira/browse/FLINK-19994 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.12.0 > Reporter: Zhu Zhu > Priority: Blocker > Fix For: 1.12.0 > > > After switching to pipelined region scheduling, all vertices in an DataSet > iteration job will be eagerly scheduled, which means BLOCKING result > consumers can be deployed even before the result finishes and resource waste > happens. This is because all vertices will be put into one pipelined region > if the job contains {{ColocationConstraint}}, see > [PipelinedRegionComputeUtil|https://github.com/apache/flink/blob/c0f382f5f0072441ef8933f6993f1c34168004d6/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/PipelinedRegionComputeUtil.java#L52]. > IIUC, this {{makeAllOneRegion()}} behavior was introduced to ensure > co-located iteration head and tail to be restarted together in pipelined > region failover. However, given that edges within an iteration will always be > PIPELINED > ([ref|https://github.com/apache/flink/blob/0523ef6451a93da450c6bdf5dd4757c3702f3962/flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java#L1188]), > co-located iteration head and tail will always be in the same region. So I > think we can drop the {{PipelinedRegionComputeUtil#makeAllOneRegion()}} code > path and build regions in the the same way no matter if there is co-location > constraints or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)