[ https://issues.apache.org/jira/browse/FLINK-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944732#comment-14944732 ]
ASF GitHub Bot commented on FLINK-2066: --------------------------------------- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1223#discussion_r41239329 --- Diff: docs/apis/programming_guide.md --- @@ -1992,6 +1992,8 @@ With the closure cleaner disabled, it might happen that an anonymous user functi - `getNumberOfExecutionRetries()` / `setNumberOfExecutionRetries(int numberOfExecutionRetries)` Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of `-1` indicates that the system default value (as defined in the configuration) should be used. +- `getExecutionRetryDelay()` / `setExecutionRetryDelay(long executionRetryDelay)` Sets the delay that failed tasks are re-executed. A value of `-1` indicates that the default value should be used. --- End diff -- I think this is a critical parameter, so I would like to extend the description a bit. How about this: ``` Sets the delay that the system waits after a job has failed, before re-executing it. The delay starts after all tasks have been successfully been stopped on the TaskManagers, and once the delay is past, the tasks are re-started. This parameter is useful to delay re-execution in order to let certain time-out related failures surface fully (like broken connections that have not fully timed out), before attempting a re-execution and immediately failing again due to the same problem. This parameter only has an effect if the number of execution re-tries is one or more. ``` > Make delay between execution retries configurable > ------------------------------------------------- > > Key: FLINK-2066 > URL: https://issues.apache.org/jira/browse/FLINK-2066 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 0.9, 0.10 > Reporter: Stephan Ewen > Assignee: Nuno Miguel Marques dos Santos > Priority: Blocker > Labels: starter > Fix For: 0.10 > > > Flink allows to specify a delay between execution retries. This helps to let > some external failure causes fully manifest themselves before the restart is > attempted. > The delay is currently defined only system wide. > We should add it to the {{ExecutionConfig}} of a job to allow per-job > specification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)