Hi Prabhu, If the edge between a -> b -> c -> d -> e all are point-wise, In theory, it should form two regions.
Best regards, Weijie Prabhu Joseph <prabhujose.ga...@gmail.com> 于2023年5月15日周一 09:58写道: > Hi, I am testing the Flink Fine-Grained Recovery > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures> > from Task Failures on Flink 1.17 and am facing some issues where I need > some advice. Have a jobgraph below with 5 operators, and all connections > between operators are pipelined and the job's parallelism.default is set to > 2. Have configured RestartPipelinedRegionFailoverStrategy with Exponential > Delay Restart Strategy. > > A -> B -> C -> D -> E > > There are a total of 10 tasks running. The first pipeline (a1 to e1) runs > on a TaskManager (say TM1), and the second pipeline (a2 to e2) runs on > another TaskManager (say TM2). > > a1 -> b1 -> c1 -> d1 -> e1 > a2 -> b2 -> c2 -> d2 -> e2 > > When TM1 failed, I expected only 5 tasks (a1 to e1) would fail and they > alone would be restarted, but all 10 tasks are getting restarted. There is > only one pipeline region, which consists of all 10 execution vertices, and > RestartPipelinedRegionFailoverStrategy#getTasksNeedingRestart returns all > 10 tasks. Is it the right behaviour, or could there be any issue? Is it > possible to restart only the pipeline of the failed task (a1 to e1) without > restarting other parallel pipelines. > > Thanks, > Prabhu Joseph > > > > > > > > > >