[ https://issues.apache.org/jira/browse/FLINK-33483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784007#comment-17784007 ]
Xin Chen commented on FLINK-33483: ---------------------------------- But in another scenario in production practice, UN also appears. The Jm log can be found in the file [^container_e15_1693914709123_8498_01_000001_8042] , but I have not fully reproduced this scene. Based on the key information in the log, it can be seen that: {code:java} 15:00:57.657 State change: SUSPENDED Connection to ZooKeeper suspended, waiting for reconnection. 15:00:54.754 org.apache.flink.util.FlinkException: ResourceManager leader changed to new address null 15:00:54.759 Job DataDistribution$ (281592085ed7f391ab59b83a53c40db3) switched from state RUNNING to RESTARTING. 15:00:54.771 Job DataDistribution$ (281592085ed7f391ab59b83a53c40db3) switched from state RESTARTING to SUSPENDED. org.apache.flink.util.FlinkException: JobManager is no longer the leader. Unable to canonicalize address zookeeper:2181 because it's not resolvable. 15:00:55.694 closing socket connection and attempting reconnect 15:00:57.657 State change: RECONNECTED 15:00:57.739 Connection to ZooKeeper was reconnected. Leader retrieval can be restarted. 15:00:57.740 Connection to ZooKeeper was reconnected. Leader election can be restarted. {code} > Why is “UNDEFINED” defined in the Flink task status? > ---------------------------------------------------- > > Key: FLINK-33483 > URL: https://issues.apache.org/jira/browse/FLINK-33483 > Project: Flink > Issue Type: Improvement > Components: Runtime / RPC, Runtime / Task > Affects Versions: 1.12.2 > Reporter: Xin Chen > Priority: Major > Attachments: container_e15_1693914709123_8498_01_000001_8042 > > > In the Flink on Yarn mode, if an unknown status appears in the Flink log, > jm(jobmanager) will report the task status as undefined. The Yarn page will > display the state as FINISHED, but the final status is *UNDEFINED*. In terms > of business, it is unknown whether the task has failed or succeeded, and > whether to retry. It has a certain impact. Why should we design UNDEFINED? > Usually, this situation occurs due to zk(zookeeper) disconnection or jm > abnormality, etc. Since the abnormality is present, why not use FAILED? > -- This message was sent by Atlassian Jira (v8.20.10#820010)