[ 
https://issues.apache.org/jira/browse/FLINK-20261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin updated FLINK-20261:
------------------------------------
    Description: 
While trying to extend 
{{FileSourceTextLinesITCase::testContinuousTextFileSourceWithTaskManagerFailover}}
 with recovery test after TM failure 
({{TestingMiniCluster::terminateTaskExecutor}}, 
[branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in 
FLINK-20118, I encountered the following case:
* {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all 
reader tasks alive)
* call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in a 
loop of testContinuousTextFileSource
* causes graceful {{TaskExecutor::onStop}} shutdown
* causes TM/RM disconnect and failing slot allocations in JM by RM
* eventually causes {{SourceCoordinatorContext::unregisterSourceReader}}
* actual assignment starts ({{SourceCoordinatorContext::assignSplits: 
callInCoordinatorThread}})
* {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed task) 
with {{IllegalArgumentException}} which is uncaught in single thread executor
* forces ThreadPool to recreate the single thread
* calls {{CoordinatorExecutorThreadFactory::newThread}}
* fails expected condition of single thread creation with 
{{IllegalStateException}} which is uncaught
* calls {{FatalExitExceptionHandler}} and exits JVM abruptly


{code:java}
[SourceCoordinator-Source: file-source] ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 
'SourceCoordinator-Source: file-source' produced an uncaught exception. 
Stopping the process...
java.lang.IllegalStateException: Should never happen. This factory should only 
be used by a SingleThreadExecutor.
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94)
 ~[classes/:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619)
 ~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) 
~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
 ~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) 
~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_172]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

Process finished with exit code 239
{code}


  was:
While trying to extend 
{{FileSourceTextLinesITCase::testContinuousTextFileSource}} with recovery test 
after TM failure ({{TestingMiniCluster::terminateTaskExecutor}}, 
[branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in 
FLINK-20118, I encountered the following case:
* {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all 
reader tasks alive)
* call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in a 
loop of testContinuousTextFileSource
* causes graceful {{TaskExecutor::onStop}} shutdown
* causes TM/RM disconnect and failing slot allocations in JM by RM
* eventually causes {{SourceCoordinatorContext::unregisterSourceReader}}
* actual assignment starts ({{SourceCoordinatorContext::assignSplits: 
callInCoordinatorThread}})
* {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed task) 
with {{IllegalArgumentException}} which is uncaught in single thread executor
* forces ThreadPool to recreate the single thread
* calls {{CoordinatorExecutorThreadFactory::newThread}}
* fails expected condition of single thread creation with 
{{IllegalStateException}} which is uncaught
* calls {{FatalExitExceptionHandler}} and exits JVM abruptly


{code:java}
[SourceCoordinator-Source: file-source] ERROR 
org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 
'SourceCoordinator-Source: file-source' produced an uncaught exception. 
Stopping the process...
java.lang.IllegalStateException: Should never happen. This factory should only 
be used by a SingleThreadExecutor.
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94)
 ~[classes/:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619)
 ~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) 
~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
 ~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) 
~[?:1.8.0_172]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_172]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]

Process finished with exit code 239
{code}



> Uncaught exception in ExecutorNotifier due to split assignment broken by 
> failed task
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-20261
>                 URL: https://issues.apache.org/jira/browse/FLINK-20261
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Common, Connectors / FileSystem
>    Affects Versions: 1.12.0
>            Reporter: Andrey Zagrebin
>            Priority: Blocker
>             Fix For: 1.12.0
>
>
> While trying to extend 
> {{FileSourceTextLinesITCase::testContinuousTextFileSourceWithTaskManagerFailover}}
>  with recovery test after TM failure 
> ({{TestingMiniCluster::terminateTaskExecutor}}, 
> [branch|https://github.com/azagrebin/flink/tree/FLINK-20118-it]) in 
> FLINK-20118, I encountered the following case:
> * {{SourceCoordinatorContext::assignSplits}} schedules async assignment (all 
> reader tasks alive)
> * call {{TestingMiniCluster::terminateTaskExecutor}} while doing writeFile in 
> a loop of testContinuousTextFileSource
> * causes graceful {{TaskExecutor::onStop}} shutdown
> * causes TM/RM disconnect and failing slot allocations in JM by RM
> * eventually causes {{SourceCoordinatorContext::unregisterSourceReader}}
> * actual assignment starts ({{SourceCoordinatorContext::assignSplits: 
> callInCoordinatorThread}})
> * {{registeredReaders.containsKey(subtaskId)}} check fails (due to failed 
> task) with {{IllegalArgumentException}} which is uncaught in single thread 
> executor
> * forces ThreadPool to recreate the single thread
> * calls {{CoordinatorExecutorThreadFactory::newThread}}
> * fails expected condition of single thread creation with 
> {{IllegalStateException}} which is uncaught
> * calls {{FatalExitExceptionHandler}} and exits JVM abruptly
> {code:java}
> [SourceCoordinator-Source: file-source] ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 
> 'SourceCoordinator-Source: file-source' produced an uncaught exception. 
> Stopping the process...
> java.lang.IllegalStateException: Should never happen. This factory should 
> only be used by a SingleThreadExecutor.
>       at 
> org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94)
>  ~[classes/:?]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619)
>  ~[?:1.8.0_172]
>       at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932)
>  ~[?:1.8.0_172]
>       at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
>  ~[?:1.8.0_172]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  ~[?:1.8.0_172]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_172]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Process finished with exit code 239
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to