[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407870#comment-15407870 ]
Stephan Ewen commented on FLINK-4256: ------------------------------------- [~zjwang] Preventing downstream restarts would be a followup optimization. In order to not make this issue here more complicated than it already is, I would first solve this, and then approach this as a separate followup. > Fine-grained recovery > --------------------- > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager > Affects Versions: 1.1.0 > Reporter: Stephan Ewen > Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)