[ https://issues.apache.org/jira/browse/FLINK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias reassigned FLINK-21030: -------------------------------- Assignee: Matthias > Broken job restart for job with disjoint graph > ---------------------------------------------- > > Key: FLINK-21030 > URL: https://issues.apache.org/jira/browse/FLINK-21030 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.11.2 > Reporter: Theo Diefenthal > Assignee: Matthias > Priority: Blocker > Fix For: 1.13.0, 1.11.4, 1.12.2 > > > Building on top of bugs: > https://issues.apache.org/jira/browse/FLINK-21028 > and https://issues.apache.org/jira/browse/FLINK-21029 : > I tried to stop a Flink application on YARN via savepoint which didn't > succeed due to a possible bug/racecondition in shutdown (Bug 21028). Due to > some reason, Flink attempted to restart the pipeline after the failure in > shutdown (21029). The bug here: > As I mentioned: My jobgraph is disjoint and the pipelines are fully isolated. > Lets say the original error occured in a single task of pipeline1. Flink then > restarted the entire pipeline1, but pipeline2 was shutdown successfully and > switched the state to FINISHED. > My job thus was in kind of an invalid state after the attempt to stopping: > One of two pipelines was running, the other was FINISHED. I guess this is > kind of a bug in the restarting behavior that only all connected components > of a graph are restarted, but the others aren't... -- This message was sent by Atlassian Jira (v8.3.4#803005)