Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/1223#discussion_r41239329 --- Diff: docs/apis/programming_guide.md --- @@ -1992,6 +1992,8 @@ With the closure cleaner disabled, it might happen that an anonymous user functi - `getNumberOfExecutionRetries()` / `setNumberOfExecutionRetries(int numberOfExecutionRetries)` Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of `-1` indicates that the system default value (as defined in the configuration) should be used. +- `getExecutionRetryDelay()` / `setExecutionRetryDelay(long executionRetryDelay)` Sets the delay that failed tasks are re-executed. A value of `-1` indicates that the default value should be used. --- End diff -- I think this is a critical parameter, so I would like to extend the description a bit. How about this: ``` Sets the delay that the system waits after a job has failed, before re-executing it. The delay starts after all tasks have been successfully been stopped on the TaskManagers, and once the delay is past, the tasks are re-started. This parameter is useful to delay re-execution in order to let certain time-out related failures surface fully (like broken connections that have not fully timed out), before attempting a re-execution and immediately failing again due to the same problem. This parameter only has an effect if the number of execution re-tries is one or more. ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---