zhuzhurk commented on a change in pull request #9113: [FLINK-13222] [runtime] 
Add documentation for failover strategy option
URL: https://github.com/apache/flink/pull/9113#discussion_r305285346
 
 

 ##########
 File path: docs/dev/task_failure_recovery.md
 ##########
 @@ -264,4 +268,51 @@ The cluster defined restart strategy is used.
 This is helpful for streaming programs which enable checkpointing.
 By default, a fixed delay restart strategy is chosen if there is no other 
restart strategy defined.
 
+## Failover Strategies
+
+Flink supports different failover strategies which can be configured via the 
configuration parameter
+*jobmanager.execution.failover-strategy* in Flink's configuration file 
`flink-conf.yaml`.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+      <th class="text-left" style="width: 50%">Failover Strategy</th>
+      <th class="text-left">Value for 
jobmanager.execution.failover-strategy</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td>Restart all</td>
+        <td>full</td>
+    </tr>
+    <tr>
+        <td>Restart pipelined region</td>
+        <td>region</td>
+    </tr>
+  </tbody>
+</table>
+
+### Restart All Failover Strategy
+
+With this strategy, all tasks in the job will be restarted to recover from a 
task failure.
+
+### Restart Pipelined Region Failover Strategy
+
+With this strategy, tasks to restart depend on the regions to restart.
+
+A region is defined by this strategy as tasks that communicate via pipelined 
data exchanges.
+- All data exchanges in a DataStream job or Streaming Table job are pipelined.
+- All data exchanges in a Batch Table job are batched.
+- Types of data exchanges in a DataSet job is decided with the 
+  [ExecutionMode]({{ site.javadocs_baseurl 
}}/api/java/org/apache/flink/api/common/ExecutionMode.html) set in 
+  [ExecutionConfig]({{ site.baseurl }}/dev/execution_configuration.html).
+
+Regions to restart are decided as below:
+1. The region containing the failed task should be restarted.
+2. If a result partition is not available while it is required by a region 
that will be restarted,
+   the region producing the result partition should be restarted as well.
+3. If a region is to be restarted, all of its consumer regions should also be 
restarted. This is to guarantee
+   data consistency. Because nondeterministic processing or partitioning can 
result in that a partition
 
 Review comment:
   Ok.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to