[ https://issues.apache.org/jira/browse/FLINK-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849071#comment-16849071 ]
lamber-ken edited comment on FLINK-12302 at 5/27/19 4:47 PM: ------------------------------------------------------------- [~gjy], from another side, we can analysis this issue only from the code. When some scene happends and the call the +MiniDispatcher#jobNotFinished+ method, it means the flink job terminate unexpectedly, so it will notify the RM to kill the yarn application with +ApplicationStatus.UNKNOWN+ state, then the +UNKNOWN+ state will transfer to +{{UNDEFINED}}+ by +YarnResourceManager#getYarnStatus.+ But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet finished. *MiniDispatcher#jobNotFinished* {code:java} @Override protected void jobNotFinished(JobID jobId) { super.jobNotFinished(jobId); // shut down since we have done our job jobTerminationFuture.complete(ApplicationStatus.UNKNOWN); } {code} *YarnResourceManager#getYarnStatus* {code:java} private FinalApplicationStatus getYarnStatus(ApplicationStatus status) { if (status == null) { return FinalApplicationStatus.UNDEFINED; } else { switch (status) { case SUCCEEDED: return FinalApplicationStatus.SUCCEEDED; case FAILED: return FinalApplicationStatus.FAILED; case CANCELED: return FinalApplicationStatus.KILLED; default: return FinalApplicationStatus.UNDEFINED; } } } {code} *Hadoop Application Status* [FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32] {code:java} /** * Enumeration of various final states of an Application. */ @Public @Stable public enum FinalApplicationStatus { /** Undefined state when either the application has not yet finished */ UNDEFINED, /** Application which finished successfully. */ SUCCEEDED, /** Application which failed. */ FAILED, /** Application which was terminated by a user or admin. */ KILLED } {code} *Longrunning Applications's FinalStatus* *!image-2019-05-28-00-46-49-740.png!* was (Author: lamber-ken): [~gjy], from another side, we can analysis this issue only from the code. When some scene happends and the call the +MiniDispatcher#jobNotFinished+ method, it means the flink job terminate unexpectedly, so it will notify the RM to kill the yarn application with +ApplicationStatus.UNKNOWN+ state, then the +UNKNOWN+ state will transfer to +{{UNDEFINED}}+ by +YarnResourceManager#getYarnStatus.+ But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet finished. *MiniDispatcher#jobNotFinished* {code:java} @Override protected void jobNotFinished(JobID jobId) { super.jobNotFinished(jobId); // shut down since we have done our job jobTerminationFuture.complete(ApplicationStatus.UNKNOWN); } {code} *YarnResourceManager#getYarnStatus* {code:java} private FinalApplicationStatus getYarnStatus(ApplicationStatus status) { if (status == null) { return FinalApplicationStatus.UNDEFINED; } else { switch (status) { case SUCCEEDED: return FinalApplicationStatus.SUCCEEDED; case FAILED: return FinalApplicationStatus.FAILED; case CANCELED: return FinalApplicationStatus.KILLED; default: return FinalApplicationStatus.UNDEFINED; } } } {code} *Hadoop Application Status* [FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32] {code:java} /** * Enumeration of various final states of an Application. */ @Public @Stable public enum FinalApplicationStatus { /** Undefined state when either the application has not yet finished */ UNDEFINED, /** Application which finished successfully. */ SUCCEEDED, /** Application which failed. */ FAILED, /** Application which was terminated by a user or admin. */ KILLED } {code} > Fixed the wrong finalStatus of yarn application when application finished > ------------------------------------------------------------------------- > > Key: FLINK-12302 > URL: https://issues.apache.org/jira/browse/FLINK-12302 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN > Affects Versions: 1.8.0 > Reporter: lamber-ken > Assignee: lamber-ken > Priority: Minor > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: fix-bad-finalStatus.patch, flink-conf.yaml, > image-2019-04-23-19-56-49-933.png, image-2019-05-28-00-46-49-740.png, > jobmanager-05-27.log, jobmanager-1.log, jobmanager-2.log, screenshot-1.png, > screenshot-2.png, spslave4.bigdata.ly_23951, spslave5.bigdata.ly_20271, > test.jar > > Time Spent: 10m > Remaining Estimate: 0h > > flink job(flink-1.6.3) failed in per-job yarn cluste mode, the > resourcemanager of yarn rerun the job. > when the job failed again, the application while finish, but the finalStatus > is +UNDEFINED,+ It's better to show state +FAILED+ > !image-2019-04-23-19-56-49-933.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)