Hello everyone,
Till, Zhu Zhu and myself have prepared a Design Document
<https://docs.google.com/document/d/1YHOpMLdC-dtgjcM-EDn6v-oXgsEQKXSoMjqRcYVbJA8>
for introducing backtracking for failover regions. This is an
optimization of the failure handling logic for jobs with blocking result
partitions (which primarily exist in batch jobs), where only part of the
job has to be restarted.
This has a continuation of the FLIP-1
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures>
efforts to introduce fine-grained recovery from task failures.
The associated JIRA can be found here
<https://issues.apache.org/jira/browse/FLINK-12068>.
Any feedback is highly appreciated.
Regards,
Chesnay