Hi Zakelly, thanks for your reply. See my inlined responses below: On Wed, Jun 5, 2024 at 10:26 AM Zakelly Lan <zakelly....@gmail.com> wrote:
> Hi Matthias, > > Thanks for your proposal! I have a few questions: > > 1. Is it possible a change event observed right after a complete checkpoint > (or within a specific short time after a checkpoint) that triggers a > rescale immediately? Sometimes the checkpoint interval is huge and it is > better to rescale immediately. > That's something that could be considered as another optimization. I would consider this as a possible follow-up. My concern here is that we'd make the rescaling configuration even more complicated by introducing yet another parameter. > 2. Should we introduce `CheckpointLifecycleListener` instead of reusing > `CheckpointListener`? Is `CheckpointListener` enough for this scenario? > Good point, they are serving similar purposes. But I'm hesitant to use CheckpointListener (which is a public interface) for this internal quite narrowly scoped runtime-specific use case of FLIP-461. It might be worth renaming the internal interface into something that indicates its internal usage to avoid confusion. > Best, > Zakelly > > On Wed, Jun 5, 2024 at 3:02 PM Matthias Pohl <map...@apache.org> wrote: > > > Hi ConradJam, > > thanks for your response. > > > > The CheckpointStatsTracker gets notified about the checkpoint completion > > after the checkpoint is finalized, i.e. all its data is persisted and the > > metadata is written to the CompletedCheckpointStore. At this moment, the > > checkpoint is considered for restoring a job and, therefore, becomes > > available for restarts. This workflow also applies to unaligned > > checkpoints. But I see how this context might be helpful for > understanding > > the change. I will add it to the FLIP. So far, I don't see a reason > > to disable the feature for unaligned checkpoints. Do you see other issues > > that might make it necessary to disable this feature for this type of > > checkpoints? > > > > Can you elaborate a bit more what you mean by "checkpoints that do not > > check it"? I do not fully understand what you are referring to with "it" > > here. > > > > Best, > > Matthias > > > > On Wed, Jun 5, 2024 at 4:46 AM ConradJam <jam.gz...@gmail.com> wrote: > > > > > I have a few questions: > > > Unaligned checkpoints Do we need to enable this feature? Whether this > > > feature should be disabled for checkpoints that do not check it > > > > > > Matthias Pohl <map...@apache.org> 于2024年6月4日周二 18:03写道: > > > > > > > Hi everyone, > > > > I'd like to discuss FLIP-461 [1]. The FLIP proposes the > synchronization > > > of > > > > rescaling and the completion of checkpoints. The idea is to reduce > the > > > > amount of data that needs to be processed after rescaling happened. A > > > more > > > > detailed motivation can be found in FLIP-461. > > > > > > > > I'm looking forward to feedback and suggestions. > > > > > > > > Best, > > > > Matthias > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing > > > > > > > > > > > > > -- > > > Best > > > > > > ConradJam > > > > > >