Hi everyone, Thanks for all the comments! Since there have been no further comments for a while, I would like to start a vote for this FLIP.
> 2025年3月19日 16:36,Yanfei Lei <fredia...@gmail.com> 写道: > > Hi Han, > > Thanks for the proposal. > Faster Checkpoint & Recovery lays the groundwork for Disaggregated > State to adapt to cloud-native deployment. Regarding the FLIP, I have > three comments: > > 1. Are there any preliminary evaluation results available for this feature? > 2. In terms of compatibility, can this feature be enabled using an > existing original checkpoint or a native savepoint? > 3. Does this feature introduce any additional overhead? > > Han Yin <alexyin...@gmail.com> 于2025年2月21日周五 19:00写道: > >> >> Hi Zakelly, >> Thanks for your response! >> 1. Sure. I’ve added a Section called ‘End-to-end user case’ after the >> section ‘Overview’. >> 2. Yes, because reusing files somewhat goes against the semantics of a full >> checkpoint. If full-checkpoint is enforced, the FileTransferStrategy will >> enforce the files to be transferred by copying instead by reusing. >> 3. Yes. The changes happen all under the ForStStateBackend. I’ve updated >> the Section in the FLIP. >> 4. In fact, we don't need much special file handling for checkpoint >> failures, as they are managed by ForSt’s snapshot strategy. The proposed >> FileTransferStrategy only checks whether the files are successfully >> transferred. If the transfer is unsuccessful, it throws an exception, >> ultimately failing the checkpoint. If the transfer succeed but the >> checkpoint is aborted, since the file is already 'uploaded' to the >> checkpoint directory, it is no longer owned by the DB, and the snapshot >> strategy will skip re-uploading it for subsequent checkpoints. >> >>> 2025年2月17日 11:44,Zakelly Lan <zakelly....@gmail.com> 写道: >>> >>> Hi Han, >>> >>> Thanks for driving this! >>> >>> The FLIP is in good shape, here are my comments: >>> >>> 1. The FLIP introduces the file reusing during snapshot and recovery. Could >>> you please provide some common use cases from the user's perspective? e.g. >>> Periodic checkpoint, native savepoint. >>> 2. Does the current design depend on the incremental checkpoint? If we >>> enforce the full checkpoint, then what happened? >>> 3. Will all the proposed changes be under the ForStStateBackend? It is >>> better to emphasize this in 'Proposed Changes' >>> 4. Is there any special file handling for checkpoint failure? >>> >>> >>> Best, >>> Zakelly >>> >>> >>> On Fri, Feb 14, 2025 at 6:35 PM Han Yin <alexyin...@gmail.com> wrote: >>> >>>> Hi everyone, >>>> >>>> I would like to open a discussion on implementing faster checkpoint & >>>> recovery for disaggregated state[1]. >>>> >>>> This is an improvement work for the disaggregated state management ForSt, >>>> so you may want to read FLIP-423[2] and FLIP-428[3] to know the >>>> backgrounds. >>>> >>>> Currently, ForSt copies or fast-duplicates files between the working >>>> directory and the checkpoint directory during checkpointing and >>>> restoration. However, in a disaggregated environment, there is no need to >>>> maintain multiple copies of files since they typically reside within the >>>> same remote file system. Therefore, we propose an approach for reusing >>>> files when ForSt generates snapshots or restores from checkpoints and for >>>> managing the file ownership between Flink & ForSt. By eliminating the >>>> overhead of file copying, checkpointing & restoration & rescaling can >>>> become significantly faster for disaggregated state. >>>> >>>> Looking forward to your comments or feedback. Best regards, >>>> Han Yin >>>> >>>> [1] >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898 >>>> < >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898 >>>>> >>>> [2] >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855 >>>> < >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855 >>>>> >>>> [3] >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865 >>>> < >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865 >>>>> >>>> >>>> >>>> >>>> >> > > > -- > Best, > Yanfei