[ https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135672#comment-17135672 ]
Chesnay Schepler commented on FLINK-17075: ------------------------------------------ How should we handle tasks that are in a {{DEPLOYING}} state? Tasks are in this state before anything was sent to the TaskExecutor; if we receive a heartbeat between this state transition and the actual deployment we would fail the task for no reason. We can't just ignore tasks in this state, because the we also have to handle cases where the update that the task is running can be lost. > Add task status reconciliation between TM and JM > ------------------------------------------------ > > Key: FLINK-17075 > URL: https://issues.apache.org/jira/browse/FLINK-17075 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.10.0, 1.11.0 > Reporter: Till Rohrmann > Priority: Critical > Fix For: 1.10.2, 1.12.0, 1.11.1 > > > In order to harden the TM and JM communication I suggest to let the > {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of > the heartbeat payload (similar to FLINK-11059). This would allow to reconcile > the states of both components in case that a status update message was lost > as described by a user on the ML. > https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)