Hi everyone, Thanks for all the comments! Since there have been no further 
comments for a while, I would like to start a vote for this FLIP.

> 2025年3月19日 16:36,Yanfei Lei <fredia...@gmail.com> 写道:
> 
> Hi Han,
> 
> Thanks for the proposal.
> Faster Checkpoint & Recovery lays the groundwork for Disaggregated
> State to adapt to cloud-native deployment. Regarding the FLIP, I have
> three comments:
> 
> 1. Are there any preliminary evaluation results available for this feature?
> 2. In terms of compatibility, can this feature be enabled using an
> existing original checkpoint or a native savepoint?
> 3. Does this feature introduce any additional overhead?
> 
> Han Yin <alexyin...@gmail.com> 于2025年2月21日周五 19:00写道:
> 
>> 
>> Hi Zakelly,
>> Thanks for your response!
>> 1. Sure. I’ve added a Section called ‘End-to-end user case’ after the 
>> section ‘Overview’.
>> 2. Yes, because reusing files somewhat goes against the semantics of a full 
>> checkpoint. If full-checkpoint is enforced, the FileTransferStrategy will 
>> enforce the files to be transferred by copying instead by reusing.
>> 3. Yes. The changes happen all under  the ForStStateBackend. I’ve updated 
>> the Section in the FLIP.
>> 4. In fact, we don't need much special file handling for checkpoint 
>> failures, as they are managed by ForSt’s snapshot strategy. The proposed 
>> FileTransferStrategy only checks whether the files are successfully 
>> transferred. If the transfer is unsuccessful, it throws an exception, 
>> ultimately failing the checkpoint.  If the transfer succeed but the 
>> checkpoint is aborted, since the file is already 'uploaded' to the 
>> checkpoint directory, it is no longer owned by the DB, and the snapshot 
>> strategy will skip re-uploading it for subsequent checkpoints.
>> 
>>> 2025年2月17日 11:44,Zakelly Lan <zakelly....@gmail.com> 写道:
>>> 
>>> Hi Han,
>>> 
>>> Thanks for driving this!
>>> 
>>> The FLIP is in good shape, here are my comments:
>>> 
>>> 1. The FLIP introduces the file reusing during snapshot and recovery. Could
>>> you please provide some common use cases from the user's perspective? e.g.
>>> Periodic checkpoint, native savepoint.
>>> 2. Does the current design depend on the incremental checkpoint? If we
>>> enforce the full checkpoint, then what happened?
>>> 3. Will all the proposed changes be under the ForStStateBackend? It is
>>> better to emphasize this in 'Proposed Changes'
>>> 4. Is there any special file handling for checkpoint failure?
>>> 
>>> 
>>> Best,
>>> Zakelly
>>> 
>>> 
>>> On Fri, Feb 14, 2025 at 6:35 PM Han Yin <alexyin...@gmail.com> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I would like to open a discussion on implementing faster checkpoint &
>>>> recovery for disaggregated state[1].
>>>> 
>>>> This is an improvement work for the disaggregated state management ForSt,
>>>> so you may want to read FLIP-423[2] and FLIP-428[3] to know the 
>>>> backgrounds.
>>>> 
>>>> Currently, ForSt copies or fast-duplicates files between the working
>>>> directory and the checkpoint directory during checkpointing and
>>>> restoration. However, in a disaggregated environment, there is no need to
>>>> maintain multiple copies of files since they typically reside within the
>>>> same remote file system. Therefore, we propose an approach for reusing
>>>> files when ForSt generates snapshots or restores from checkpoints and for
>>>> managing the file ownership between Flink & ForSt. By eliminating the
>>>> overhead of file copying, checkpointing & restoration & rescaling can
>>>> become significantly faster for disaggregated state.
>>>> 
>>>> Looking forward to your comments or feedback.  Best regards,
>>>> Han Yin
>>>> 
>>>> [1]
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898
>>>> <
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898
>>>>> 
>>>> [2]
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855
>>>> <
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855
>>>>> 
>>>> [3]
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865
>>>> <
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
> 
> 
> --
> Best,
> Yanfei

Reply via email to