[ https://issues.apache.org/jira/browse/FLINK-33121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767214#comment-17767214 ]
Chesnay Schepler commented on FLINK-33121: ------------------------------------------ Had an offline chat; our suspicion is that something is calling the failure handling logic of the scheduler without running this call in the main thread. We'll try to confirm this theory. > Failed precondition in JobExceptionsHandler due to concurrent global failures > ----------------------------------------------------------------------------- > > Key: FLINK-33121 > URL: https://issues.apache.org/jira/browse/FLINK-33121 > Project: Flink > Issue Type: Bug > Reporter: Panagiotis Garefalakis > Priority: Major > > {{JobExceptionsHandler#createRootExceptionInfo}} *only* allows concurrent > exceptions that are local failures *--* otherwise throws an assertion as part > of {{{}asserLocalExceptionInfo{}}}. > However, there are rare cases where multiple concurrent global failures are > triggered and added to the failureCollection, before transitioning the job > state to Failed e.g., through {{StateWithExecutionGraph#handleGlobalFailure}} > of the AdaptiveScheduler. > In this case the last added will be the root and the next one will trigger > the assertion -- This message was sent by Atlassian Jira (v8.20.10#820010)