[ 
https://issues.apache.org/jira/browse/FLINK-33483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784001#comment-17784001
 ] 

Xin Chen commented on FLINK-33483:
----------------------------------

Links to FLINK-12302. Including that issue, two scenarios were discovered in 
Flink-1.12.2, with the issue of reporting *UNDEFINED* to yarn resourcemanager. 
I have replicated that scenario, which is when the task is completed in tm and 
the global terminal state (FINISHED or FAILED) is reached, the jm log shows 
"Job 65ccc2410d4554553225889dbea552d7 reached global terminal state {}, kill 
the jm process (am) or disconnect the zk connection, which will cause a new jm 
to be pulled up. The new jm assigned “UNKNOWN” based on the task's status 
"DONE" in RunningJobRegistry recorded in zk, and ultimately reported 
“UNDEFINED”.

> Why is “UNDEFINED” defined in the Flink task status?
> ----------------------------------------------------
>
>                 Key: FLINK-33483
>                 URL: https://issues.apache.org/jira/browse/FLINK-33483
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / RPC, Runtime / Task
>    Affects Versions: 1.12.2
>            Reporter: Xin Chen
>            Priority: Major
>
> In the Flink on Yarn mode, if an unknown status appears in the Flink log, 
> jm(jobmanager) will report the task status as undefined. The Yarn page will 
> display the state as FINISHED, but the final status is *UNDEFINED*. In terms 
> of business, it is unknown whether the task has failed or succeeded, and 
> whether to retry. It has a certain impact. Why should we design UNDEFINED? 
> Usually, this situation occurs due to zk(zookeeper) disconnection or jm 
> abnormality, etc. Since the abnormality is present, why not use FAILED?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to