Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1223#discussion_r41239329
  
    --- Diff: docs/apis/programming_guide.md ---
    @@ -1992,6 +1992,8 @@ With the closure cleaner disabled, it might happen 
that an anonymous user functi
     
     - `getNumberOfExecutionRetries()` / `setNumberOfExecutionRetries(int 
numberOfExecutionRetries)` Sets the number of times that failed tasks are 
re-executed. A value of zero effectively disables fault tolerance. A value of 
`-1` indicates that the system default value (as defined in the configuration) 
should be used.
     
    +- `getExecutionRetryDelay()` / `setExecutionRetryDelay(long 
executionRetryDelay)` Sets the delay that failed tasks are re-executed. A value 
of `-1` indicates that the default value should be used.
    --- End diff --
    
    I think this is a critical parameter, so I would like to extend the 
description a bit. How about this:
    
    ```
    Sets the delay that the system waits after a job has failed, before 
re-executing it. The delay starts after all tasks have been successfully been 
stopped on the TaskManagers, and once the delay is past, the tasks are 
re-started. This parameter is useful to delay re-execution in order to let 
certain time-out related failures surface fully (like broken connections that 
have not fully timed out), before attempting a re-execution and immediately 
failing again due to the same problem.  
    
    This parameter only has an effect if the number of execution re-tries is 
one or more.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to