[ 
https://issues.apache.org/jira/browse/FLINK-19927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225290#comment-17225290
 ] 

Andrey Zagrebin commented on FLINK-19927:
-----------------------------------------

True, the recent state handling logic resides in the new SchedulerNG, currently 
DefaultScheduler. The execution state handling in EG is partially inactive, 
like the problematic notifyExecutionChange in this issue. We could reconsider 
how the execution tracking for reconciliation is integrated with the 
scheduling. I think the tracking logic could be moved from Execution#deploy and 
EG#notifyExecutionChange to either SchedulerNG#updateTaskExecutionState or 
DefaultScheduler#deployTaskSafe. The latter looks to me currently more natural. 
ExecutionVertexOperations.deploy could return submission future for deployment 
completion in ExecutionDeploymentTracker and Execution#getTerminalFuture to 
stop the tracking. This would be easier to unit test as well.

Nonetheless, this is not a quick fix. The fix, which [~rmetzger] mentions in 
the issue description, would be quick, I already tried it:
 * Doing the tracking stop in EG#notifyExecutionChange w/o legacy scheduling 
check
 * Testing it in JobMasterExecutionDeploymentReconciliationTest by intercepting 
the tracking stop in DefaultExecutionDeploymentTracker

> ExecutionStateUpdateListener is only updated when legacy scheduling is enabled
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-19927
>                 URL: https://issues.apache.org/jira/browse/FLINK-19927
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.0
>            Reporter: Robert Metzger
>            Assignee: Andrey Zagrebin
>            Priority: Blocker
>             Fix For: 1.12.0
>
>
> This is a finding from FLINK-19805.
> The {{ExecutionDeploymentTracker}} is never notified about executions 
> reaching terminal state, when using the default scheduler.
> This can potentially lead to invalid execution reconciliation behavior.
> Fixing this ticket probably involves switching the statements here: 
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java#L1688-L1692
> As part of the this tickets resolution, I suggest to also introduce a test 
> case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to