Hi all, I will try to start coding based on the design document. Any feedback is welcome throughout the process.
Best, Vino vino yang <yanghua1...@gmail.com> 于2019年1月9日周三 上午12:29写道: > Hi all, > > > Currently, the checkpoint's failure handling logic is somewhat confusing > (not focused), which makes some functions on existing code passive. > > So I provide a design document to improve the Checkpoint failure process > logic. > > This design document primarily describes how to improve checkpoint failure > handling logic and make it more clear. > > Based on this, we introduce a CheckpointFailureManager, which makes the > checkpoint failure processing more flexible. > > This mainly comes from the following appeals: > > > - > > FLINK-4810[1]: Checkpoint Coordinator should fail ExecutionGraph after > "n" unsuccessful checkpoints > - > > FLINK-10074[3]: Allowable number of checkpoint failure > - > > FLINK-10724[2]: Refactor failure handling in checkpoint coordinator > > > > https://docs.google.com/document/d/1ce7RtecuTxcVUJlnU44hzcO2Dwq9g4Oyd8_biy94hJc/edit?usp=sharing > > *Thanks to @Andrey Zagrebin for helping me review the documentation and > suggesting a lot of improvements.* > > Feedback and comments are very welcome! > > Best, > Vino > > [1]: https://issues.apache.org/jira/browse/FLINK-4810 > > [2]: https://issues.apache.org/jira/browse/FLINK-10724 > [3]: https://issues.apache.org/jira/browse/FLINK-10074 >