[ https://issues.apache.org/jira/browse/FLINK-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann closed FLINK-11537. --------------------------------- Resolution: Fixed Fixed via c9e392b53b48c8ae0b189905a9b4cf878bf741e4 > ExecutionGraph does not reach terminal state when JobMaster lost leadership > --------------------------------------------------------------------------- > > Key: FLINK-11537 > URL: https://issues.apache.org/jira/browse/FLINK-11537 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Affects Versions: 1.8.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Blocker > Fix For: 1.8.0 > > > The {{ExecutionGraph}} sometimes does not reach a terminal state if the > {{JobMaster}} lost the leadership. The reason is that we use the fenced main > thread executor to execute {{ExecutionGraph}} changes and we don't wait for > the {{ExecutionGraph}} to reach the terminal state before we set the fencing > token {{null}}. > One possible solution would be to wait for the {{ExecutionGraph}} to reach > the terminal state before clearing the fencing token. This has, however, the > downside that the {{JobMaster}} is still reachable until the > {{ExecutionGraph}} has been properly terminated. Alternatively, we could use > the unfenced main thread executor to send the cancel calls out. > A Travis run where the problem occurred is here: > https://travis-ci.org/tillrohrmann/flink/jobs/489119926 > Update: The underlying problem is that {{ExecutionGraph#suspend}} does not > transition the {{ExecutionGraph}} atomically into a terminal state. Changing > this should solve the underlying problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)