[ https://issues.apache.org/jira/browse/FLINK-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092932#comment-15092932 ]
Fabian Hueske commented on FLINK-1581: -------------------------------------- [~till.rohrmann], is this issue still valid? > Configure DeathWatch parameters properly > ---------------------------------------- > > Key: FLINK-1581 > URL: https://issues.apache.org/jira/browse/FLINK-1581 > Project: Flink > Issue Type: Bug > Reporter: Till Rohrmann > > We are using Akka's DeathWath mechanism to detect failed components. However, > the interval until an {{Instance}} is marked dead is currently very long. > Especially, in conjunction with the job restarting mechanism we should devise > a mechanism which either quickly detects dead {{Instance}}s or set the > interval, pause and threshold values such that the detection does not take > longer than the Akka ask timeout value. Otherwise, all retries might be > consumed before an {{Instance}} is recognized being dead. > Further investigation of the correct failure behavior is necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)