[ https://issues.apache.org/jira/browse/FLINK-20261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrey Zagrebin updated FLINK-20261: ------------------------------------ Description: While trying to extend {{FileSourceTextLinesITCase::testContinuousTextFileSourceWithTaskManagerFailover}} with recovery test after TM failure ({{TestingMiniCluster::terminateTaskExecutor}}, [branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in FLINK-20118, I encountered the following case: * {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all reader tasks alive) * call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in a loop of testContinuousTextFileSource * causes graceful {{TaskExecutor::onStop}} shutdown * causes TM/RM disconnect and failing slot allocations in JM by RM * eventually causes {{SourceCoordinatorContext::unregisterSourceReader}} * actual assignment starts ({{SourceCoordinatorContext::assignSplits: callInCoordinatorThread}}) * {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed task) with {{IllegalArgumentException}} which is uncaught in single thread executor * forces ThreadPool to recreate the single thread * calls {{CoordinatorExecutorThreadFactory::newThread}} * fails expected condition of single thread creation with {{IllegalStateException}} which is uncaught * calls {{FatalExitExceptionHandler}} and exits JVM abruptly {code:java} [SourceCoordinator-Source: file-source] ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 'SourceCoordinator-Source: file-source' produced an uncaught exception. Stopping the process... java.lang.IllegalStateException: Should never happen. This factory should only be used by a SingleThreadExecutor. at org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94) ~[classes/:?] at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_172] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] Process finished with exit code 239 {code} was: While trying to extend {{FileSourceTextLinesITCase::testContinuousTextFileSource}} with recovery test after TM failure ({{TestingMiniCluster::terminateTaskExecutor}}, [branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in FLINK-20118, I encountered the following case: * {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all reader tasks alive) * call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in a loop of testContinuousTextFileSource * causes graceful {{TaskExecutor::onStop}} shutdown * causes TM/RM disconnect and failing slot allocations in JM by RM * eventually causes {{SourceCoordinatorContext::unregisterSourceReader}} * actual assignment starts ({{SourceCoordinatorContext::assignSplits: callInCoordinatorThread}}) * {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed task) with {{IllegalArgumentException}} which is uncaught in single thread executor * forces ThreadPool to recreate the single thread * calls {{CoordinatorExecutorThreadFactory::newThread}} * fails expected condition of single thread creation with {{IllegalStateException}} which is uncaught * calls {{FatalExitExceptionHandler}} and exits JVM abruptly {code:java} [SourceCoordinator-Source: file-source] ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 'SourceCoordinator-Source: file-source' produced an uncaught exception. Stopping the process... java.lang.IllegalStateException: Should never happen. This factory should only be used by a SingleThreadExecutor. at org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94) ~[classes/:?] at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) ~[?:1.8.0_172] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_172] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] Process finished with exit code 239 {code} > Uncaught exception in ExecutorNotifier due to split assignment broken by > failed task > ------------------------------------------------------------------------------------ > > Key: FLINK-20261 > URL: https://issues.apache.org/jira/browse/FLINK-20261 > Project: Flink > Issue Type: Bug > Components: Connectors / Common, Connectors / FileSystem > Affects Versions: 1.12.0 > Reporter: Andrey Zagrebin > Priority: Blocker > Fix For: 1.12.0 > > > While trying to extend > {{FileSourceTextLinesITCase::testContinuousTextFileSourceWithTaskManagerFailover}} > with recovery test after TM failure > ({{TestingMiniCluster::terminateTaskExecutor}}, > [branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in > FLINK-20118, I encountered the following case: > * {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all > reader tasks alive) > * call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in > a loop of testContinuousTextFileSource > * causes graceful {{TaskExecutor::onStop}} shutdown > * causes TM/RM disconnect and failing slot allocations in JM by RM > * eventually causes {{SourceCoordinatorContext::unregisterSourceReader}} > * actual assignment starts ({{SourceCoordinatorContext::assignSplits: > callInCoordinatorThread}}) > * {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed > task) with {{IllegalArgumentException}} which is uncaught in single thread > executor > * forces ThreadPool to recreate the single thread > * calls {{CoordinatorExecutorThreadFactory::newThread}} > * fails expected condition of single thread creation with > {{IllegalStateException}} which is uncaught > * calls {{FatalExitExceptionHandler}} and exits JVM abruptly > {code:java} > [SourceCoordinator-Source: file-source] ERROR > org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread > 'SourceCoordinator-Source: file-source' produced an uncaught exception. > Stopping the process... > java.lang.IllegalStateException: Should never happen. This factory should > only be used by a SingleThreadExecutor. > at > org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94) > ~[classes/:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) > ~[?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) > ~[?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025) > ~[?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) > ~[?:1.8.0_172] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_172] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172] > Process finished with exit code 239 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)