[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM

Chesnay Schepler (Jira) Mon, 15 Jun 2020 02:33:15 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135672#comment-17135672
 ]


Chesnay Schepler commented on FLINK-17075:
------------------------------------------

How should we handle tasks that are in a {{DEPLOYING}} state? Tasks are in this 
state before anything was sent to the TaskExecutor; if we receive a heartbeat 
between this state transition and the actual deployment we would fail the task 
for no reason.
We can't just ignore tasks in this state, because the we also have to handle 
cases where the update that the task is running can be lost.

> Add task status reconciliation between TM and JM
> ------------------------------------------------
>
>                 Key: FLINK-17075
>                 URL: https://issues.apache.org/jira/browse/FLINK-17075
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.10.2, 1.12.0, 1.11.1
>
>
> In order to harden the TM and JM communication I suggest to let the 
> {{TaskExecutor}} send the task statuses back to the {{JobMaster}} as part of 
> the heartbeat payload (similar to FLINK-11059). This would allow to reconcile 
> the states of both components in case that a status update message was lost 
> as described by a user on the ML.
> https://lists.apache.org/thread.html/ra9ed70866381f0ef0f4779633346722ccab3dc0d6dbacce04080b74e%40%3Cuser.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17075) Add task status reconciliation between TM and JM

Reply via email to