[ https://issues.apache.org/jira/browse/FLINK-33483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785718#comment-17785718 ]
Xin Chen commented on FLINK-33483: ---------------------------------- Hi [~mapohl] yes, this is not a question in recent version. But I also see some code about "ApplicationStatus.UNKNOWN" and "UNDEFINED", so it is confused whether there still be "UNDEFINED" scenarios in Flink? Some other code like: {code:java} // ApplicationDispatcherBootstrap.java#getJobResult() private CompletableFuture<JobResult> getJobResult( final DispatcherGateway dispatcherGateway, final JobID jobId, final ScheduledExecutor scheduledExecutor, final boolean tolerateMissingResult) { final Time timeout = Time.milliseconds(configuration.get(ClientOptions.CLIENT_TIMEOUT).toMillis()); final Time retryPeriod = Time.milliseconds(configuration.get(ClientOptions.CLIENT_RETRY_PERIOD).toMillis()); final CompletableFuture<JobResult> jobResultFuture = JobStatusPollingUtils.getJobResult( dispatcherGateway, jobId, scheduledExecutor, timeout, retryPeriod); if (tolerateMissingResult) { // Return "unknown" job result if dispatcher no longer knows the actual result. return FutureUtils.handleException( jobResultFuture, FlinkJobNotFoundException.class, exception -> new JobResult.Builder() .jobId(jobId) .applicationStatus(ApplicationStatus.UNKNOWN) .netRuntime(Long.MAX_VALUE) .build()); } return jobResultFuture; } // RestClusterClient.java#requestJobResultInternal private CompletableFuture<JobResult> requestJobResultInternal(@Nonnull JobID jobId) { return pollResourceAsync( () -> { final JobMessageParameters messageParameters = new JobMessageParameters(); messageParameters.jobPathParameter.resolve(jobId); return sendRequest( JobExecutionResultHeaders.getInstance(), messageParameters); }) .thenApply( jobResult -> { if (jobResult.getApplicationStatus() == ApplicationStatus.UNKNOWN) { throw new JobStateUnknownException( String.format("Result for Job %s is UNKNOWN", jobId)); } return jobResult; }); } {code} I want to know which other scenarios the final state of the Flink task may be UNDEFINED. > Why is “UNDEFINED” defined in the Flink task status? > ---------------------------------------------------- > > Key: FLINK-33483 > URL: https://issues.apache.org/jira/browse/FLINK-33483 > Project: Flink > Issue Type: Improvement > Components: Deployment / YARN > Affects Versions: 1.12.2 > Reporter: Xin Chen > Priority: Major > Attachments: container_e15_1693914709123_8498_01_000001_8042, > reproduce.log > > > In the Flink on Yarn mode, if an unknown status appears in the Flink log, > jm(jobmanager) will report the task status as undefined. The Yarn page will > display the state as FINISHED, but the final status is *UNDEFINED*. In terms > of business, it is unknown whether the task has failed or succeeded, and > whether to retry. It has a certain impact. Why should we design UNDEFINED? > Usually, this situation occurs due to zk(zookeeper) disconnection or jm > abnormality, etc. Since the abnormality is present, why not use FAILED? > -- This message was sent by Atlassian Jira (v8.20.10#820010)