Robert Metzger created FLINK-2079:
-------------------------------------

             Summary: Add watcher to YARN TM containers to detect stopped actor 
system
                 Key: FLINK-2079
                 URL: https://issues.apache.org/jira/browse/FLINK-2079
             Project: Flink
          Issue Type: Improvement
          Components: TaskManager, YARN Client
    Affects Versions: 0.9
            Reporter: Robert Metzger
            Assignee: Robert Metzger


I experienced an OutOfMemoryError (caused by the usercode) while running Flink 
on YARN.
It seems that the TaskManager is correctly detecting the fatal error, however 
the JVM is not shutting down, so YARN won't bring up new containers.

Therefore, I want to start a thread on the YarnTaskManagerRunner which 
periodically (every 30 seconds) checks whether the actor system is still 
running. If not, its doing a System.exit(1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to