[jira] [Comment Edited] (FLINK-12302) Fixed the wrong finalStatus of yarn application when application finished

lamber-ken (JIRA) Mon, 27 May 2019 09:48:11 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-12302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849071#comment-16849071
 ]


lamber-ken edited comment on FLINK-12302 at 5/27/19 4:47 PM:
-------------------------------------------------------------

[~gjy], from another side, we can analysis this issue only from the code.

When some scene happends and the call the +MiniDispatcher#jobNotFinished+ 
method, it means the flink job terminate unexpectedly, so it will notify the RM 
to kill the yarn application with +ApplicationStatus.UNKNOWN+ state, then the 
+UNKNOWN+ state will transfer to +{{UNDEFINED}}+ by 
+YarnResourceManager#getYarnStatus.+

 

But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet 
finished.

 

*MiniDispatcher#jobNotFinished*
{code:java}
@Override
protected void jobNotFinished(JobID jobId) {
   super.jobNotFinished(jobId);

   // shut down since we have done our job
   jobTerminationFuture.complete(ApplicationStatus.UNKNOWN);
}
{code}
*YarnResourceManager#getYarnStatus*
{code:java}
private FinalApplicationStatus getYarnStatus(ApplicationStatus status) {
   if (status == null) {
      return FinalApplicationStatus.UNDEFINED;
   }
   else {
      switch (status) {
         case SUCCEEDED:
            return FinalApplicationStatus.SUCCEEDED;
         case FAILED:
            return FinalApplicationStatus.FAILED;
         case CANCELED:
            return FinalApplicationStatus.KILLED;
         default:
            return FinalApplicationStatus.UNDEFINED;
      }
   }
}
{code}
 

*Hadoop Application Status* 
[FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32]
{code:java}
/**
 * Enumeration of various final states of an Application.
 */
@Public
@Stable
public enum FinalApplicationStatus {

 /** Undefined state when either the application has not yet finished */
  UNDEFINED,

  /** Application which finished successfully. */
  SUCCEEDED,

  /** Application which failed. */
  FAILED,

  /** Application which was terminated by a user or admin. */
  KILLED
}
{code}
 

*Longrunning Applications's FinalStatus*

*!image-2019-05-28-00-46-49-740.png!*

 

 


was (Author: lamber-ken):
[~gjy], from another side, we can analysis this issue only from the code.

When some scene happends and the call the +MiniDispatcher#jobNotFinished+ 
method, it means the flink job terminate unexpectedly, so it will notify the RM 
to kill the yarn application with +ApplicationStatus.UNKNOWN+ state, then the 
+UNKNOWN+ state will transfer to +{{UNDEFINED}}+ by 
+YarnResourceManager#getYarnStatus.+

 

But, in hadoop system, the +{{UNDEFINED}}+ means the application has not yet 
finished.

 

*MiniDispatcher#jobNotFinished*
{code:java}
@Override
protected void jobNotFinished(JobID jobId) {
   super.jobNotFinished(jobId);

   // shut down since we have done our job
   jobTerminationFuture.complete(ApplicationStatus.UNKNOWN);
}
{code}
*YarnResourceManager#getYarnStatus*
{code:java}
private FinalApplicationStatus getYarnStatus(ApplicationStatus status) {
   if (status == null) {
      return FinalApplicationStatus.UNDEFINED;
   }
   else {
      switch (status) {
         case SUCCEEDED:
            return FinalApplicationStatus.SUCCEEDED;
         case FAILED:
            return FinalApplicationStatus.FAILED;
         case CANCELED:
            return FinalApplicationStatus.KILLED;
         default:
            return FinalApplicationStatus.UNDEFINED;
      }
   }
}
{code}
 

*Hadoop Application Status* 
[FinalApplicationStatus|https://github.com/apache/hadoop-common/blob/42a61a4fbc88303913c4681f0d40ffcc737e70b5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java#L32]
{code:java}
/**
 * Enumeration of various final states of an Application.
 */
@Public
@Stable
public enum FinalApplicationStatus {

 /** Undefined state when either the application has not yet finished */
  UNDEFINED,

  /** Application which finished successfully. */
  SUCCEEDED,

  /** Application which failed. */
  FAILED,

  /** Application which was terminated by a user or admin. */
  KILLED
}
{code}
  

> Fixed the wrong finalStatus of yarn application when application finished
> -------------------------------------------------------------------------
>
>                 Key: FLINK-12302
>                 URL: https://issues.apache.org/jira/browse/FLINK-12302
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.8.0
>            Reporter: lamber-ken
>            Assignee: lamber-ken
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>         Attachments: fix-bad-finalStatus.patch, flink-conf.yaml, 
> image-2019-04-23-19-56-49-933.png, image-2019-05-28-00-46-49-740.png, 
> jobmanager-05-27.log, jobmanager-1.log, jobmanager-2.log, screenshot-1.png, 
> screenshot-2.png, spslave4.bigdata.ly_23951, spslave5.bigdata.ly_20271, 
> test.jar
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> flink job(flink-1.6.3) failed in per-job yarn cluste mode, the 
> resourcemanager of yarn rerun the job.
> when the job failed again, the application while finish, but the finalStatus 
> is +UNDEFINED,+  It's better to show state +FAILED+
> !image-2019-04-23-19-56-49-933.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-12302) Fixed the wrong finalStatus of yarn application when application finished

Reply via email to