[ 
https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861266#comment-15861266
 ] 

ASF GitHub Bot commented on FLINK-5759:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3290#discussion_r100534934
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java 
---
    @@ -99,7 +99,8 @@ public FileCache(String[] tempDirectories) throws 
IOException {
                this.shutdownHook = createShutdownHook(this, LOG);
     
                this.entries = new HashMap<JobID, Map<String, Tuple4<Integer, 
File, Path, Future<Path>>>>();
    -           this.executorService = Executors.newScheduledThreadPool(10, 
ExecutorThreadFactory.INSTANCE);
    +           this.executorService = Executors.newScheduledThreadPool(10, 
    --- End diff --
    
    This PR just did not want to change anything else than what its goal was.
    The `10` is pretty magic, though, agreed. Something relative to the number 
of cores seems to make more sense, intuitively.


> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>
>                 Key: FLINK-5759
>                 URL: https://issues.apache.org/jira/browse/FLINK-5759
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>
> Currently, the thread pools of the {{JobManager}} do not have any 
> {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in 
> most places), when exceptions slip through in these threads (which execute 
> future responses and delayed actions), the JobManager may be in an 
> inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of 
> uncaught exceptions. Letting the JobManager be restarted by the respective 
> cluster framework is the only guaranteed way to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to