Till Rohrmann created FLINK-11537:
-------------------------------------
Summary: ExecutionGraph does not reach terminal state when
JobMaster lost leadership
Key: FLINK-11537
URL: https://issues.apache.org/jira/browse/FLINK-11537
Project: Flink
Issue Type: Bug
Components: Distributed Coordination
Affects Versions: 1.8.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Fix For: 1.8.0
The {{ExecutionGraph}} sometimes does not reach a terminal state if the
{{JobMaster}} lost the leadership. The reason is that we use the fenced main
thread executor to execute {{ExecutionGraph}} changes and we don't wait for the
{{ExecutionGraph}} to reach the terminal state before we set the fencing token
{{null}}.
One possible solution would be to wait for the {{ExecutionGraph}} to reach the
terminal state before clearing the fencing token. This has, however, the
downside that the {{JobMaster}} is still reachable until the {{ExecutionGraph}}
has been properly terminated. Alternatively, we could use the unfenced main
thread executor to send the cancel calls out.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)