[ 
https://issues.apache.org/jira/browse/FLINK-33121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767214#comment-17767214
 ] 

Chesnay Schepler commented on FLINK-33121:
------------------------------------------

Had an offline chat; our suspicion is that something is calling the failure 
handling logic of the scheduler without running this call in the main thread.

We'll try to confirm this theory.

> Failed precondition in JobExceptionsHandler due to concurrent global failures
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-33121
>                 URL: https://issues.apache.org/jira/browse/FLINK-33121
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Panagiotis Garefalakis
>            Priority: Major
>
> {{JobExceptionsHandler#createRootExceptionInfo}} *only* allows concurrent 
> exceptions that are local failures *--* otherwise throws an assertion as part 
> of {{{}asserLocalExceptionInfo{}}}.
> However, there are rare cases where multiple concurrent global failures are 
> triggered and added to the failureCollection, before transitioning the job 
> state to Failed e.g., through {{StateWithExecutionGraph#handleGlobalFailure}} 
> of the AdaptiveScheduler.
> In this case the last added will be the root and the next one will trigger 
> the assertion 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to