[ 
https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861271#comment-15861271
 ] 

ASF GitHub Bot commented on FLINK-5759:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3290#discussion_r100535249
  
    --- Diff: 
flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosApplicationMasterRunner.java
 ---
    @@ -216,11 +220,11 @@ protected int runPrivileged(Configuration config, 
Configuration dynamicPropertie
     
                        futureExecutor = Executors.newScheduledThreadPool(
                                numberProcessors,
    -                           new 
NamedThreadFactory("mesos-jobmanager-future-", "-thread-"));
    +                           new 
ExecutorThreadFactory("mesos-jobmanager-future"));
    --- End diff --
    
    The pool is not really tied to Akka. Akka has its own threads for the 
actors. The JobManager actor uses the "future" pool for futures produced by the 
actors. The ExecutionGraph also uses that pool for some callbacks.


> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>
>                 Key: FLINK-5759
>                 URL: https://issues.apache.org/jira/browse/FLINK-5759
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>
> Currently, the thread pools of the {{JobManager}} do not have any 
> {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in 
> most places), when exceptions slip through in these threads (which execute 
> future responses and delayed actions), the JobManager may be in an 
> inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of 
> uncaught exceptions. Letting the JobManager be restarted by the respective 
> cluster framework is the only guaranteed way to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to