Hi Zakelly,

I am fine with either Option 2 or Option 3. I think the naming in
Option 2 makes it clear that it is a boolean configuration. However,
most of the currently available boolean configurations do not use
"enable" as a suffix. Therefore, Option 3 looks good to me as it
follows the current practice.

Best regards,
Xuannan

On Thu, Jan 11, 2024 at 9:50 AM Hangxiang Yu <master...@gmail.com> wrote:
>
> >
> > That's a very good point. I realize that the word 'recovery' means way too
> > many things. So I suggest picking a more specific word here, how about
> > 'execution.state-recovery.*' ? Checkpointing and state recovery are
> > corresponding terms and won't make ambiguity.
> >
>
> This makes the configuration clearer to me. We could focus on the
> `state-recovery` at first.
>
> I think we could create another FLIP for the deprecation of LEGACY mode.
> >
>
> LGTM, Let's create a new FLIP to do this.
>
> IIUC, there is no clear ownership of the local copy files from the previous
> > job and it's better to define one. This needs more discussion so we could
> > create another thread for this. WDYT?
> >
>
> Yeah, I have created a new ticket FLINK-34032 to track and discuss this.
>
> On Wed, Jan 10, 2024 at 6:31 PM Zakelly Lan <zakelly....@gmail.com> wrote:
>
> > Hi everyone,
> >
> > It seems we still don't have a consensus on the rules for boolean type
> > options. Let me recap the alternatives we have:
> >
> > Option 1: Use enumeration options instead if possible. But this may cause
> > some name collisions or confusion as we discussed and we should unify the
> > statement everywhere.
> > Option 2: Use boolean options and add 'enabled' as the suffix.
> > Option 3: Use boolean options and ONLY add 'enabled' when there are more
> > detailed configurations under the same prefix, to prevent one name from
> > serving as a prefix to another.
> >
> > I am inclined to Option 3, since it is more in line with current practice
> > and friendly for existing users. Also It reduces the length of
> > configuration names as much as possible.
> >
> > Looking forward to your opinions! Thanks!
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, Jan 10, 2024 at 3:30 PM Zakelly Lan <zakelly....@gmail.com> wrote:
> >
> > > Hi Hangxiang,
> > >
> > > Thanks for your suggestions!
> > >
> > > 1. Could execution.recovery also contain some other behaviors about
> > >> recovery ? e.g. restart-strategy.
> > >
> > >
> > > That's a very good point. I realize that the word 'recovery' means way
> > too
> > > many things. So I suggest picking a more specific word here, how about
> > > 'execution.state-recovery.*' ? Checkpointing and state recovery are
> > > corresponding terms and won't make ambiguity.
> > >
> > > 2. Could we also remove some legacy configuration value ? e.g. LEGACY
> > Mode
> > >> for execution.savepoint-restore-mode/execution.recovery.claim-mode.
> > >
> > >
> > > I think we could create another FLIP for the deprecation of LEGACY mode.
> > >
> > >
> > >> 3. Could the local checkpoint be cleaned
> > >> if execution.checkpointing.local-copy.enabled is true and
> > >> execution.recovery.from-local is false ? I found it's also an issue if
> > >> current local-recovery from enabled to disabled. Maybe another ticket is
> > >> needed.
> > >
> > >
> > > IIUC, there is no clear ownership of the local copy files from the
> > > previous job and it's better to define one. This needs more discussion so
> > > we could create another thread for this. WDYT?
> > >
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Tue, Jan 9, 2024 at 11:23 AM Hangxiang Yu <master...@gmail.com>
> > wrote:
> > >
> > >> Hi, Zakelly.
> > >> Thanks for driving this. Overall LGTM as we discussed offline.
> > >>
> > >> Some comments/suggestions just came to mind:
> > >> 1. Could execution.recovery also contain some other behaviors about
> > >> recovery ? e.g. restart-strategy.
> > >> 2. Could we also remove some legacy configuration value ? e.g. LEGACY
> > Mode
> > >> for execution.savepoint-restore-mode/execution.recovery.claim-mode.
> > >> 3. Could the local checkpoint be cleaned
> > >> if execution.checkpointing.local-copy.enabled is true and
> > >> execution.recovery.from-local is false ? I found it's also an issue if
> > >> current local-recovery from enabled to disabled. Maybe another ticket is
> > >> needed.
> > >> 4. +1 for enabling execution.checkpointing.incremental by default which
> > is
> > >> basically default configuration in our production environment.
> > >>
> > >>
> > >> On Mon, Jan 8, 2024 at 6:06 PM Zakelly Lan <zakelly....@gmail.com>
> > wrote:
> > >>
> > >> > Hi Yun,
> > >> >
> > >> > Thanks for your comments!
> > >> >
> > >> >  1.  We shall not describe the configuration with its implementation
> > for
> > >> > > 'execution.checkpointing.local-copy.*' options, for hashmap
> > >> > state-backend,
> > >> > > it would write two streams and for Rocksdb state-backend, it would
> > use
> > >> > > hard-link for backup. Thus, I think
> > >> > > 'execution.checkpointing.local-backup.*' looks better.
> > >> >
> > >> > I agreed that we'd better name the option in user's perspective
> > instead
> > >> of
> > >> > the implementation, thus I name it as a copy of the checkpoint in the
> > >> > local disk, regardless of the way of generating it. The word 'backup'
> > is
> > >> > also suitable for this case, so I agree to change to
> > >> > 'execution.checkpointing.local-backup.*' if no one objects.
> > >> >
> > >> >  2.  What does the 'execution.checkpointing.data-inline-threshold'
> > >> mean? It
> > >> > > seems not so easy to understand.
> > >> >
> > >> > The 'execution.checkpointing.data-inline-threshold' (original one as
> > >> > 'state.storage.fs.memory-threshold') stands for the size threshold
> > below
> > >> > which state chunks will store inline with the metadata, thus I call it
> > >> > 'data-inline-threshold'.
> > >> >
> > >> >
> > >> > Best,
> > >> > Zakelly
> > >> >
> > >> > On Mon, Jan 8, 2024 at 10:09 AM Yun Tang <myas...@live.com> wrote:
> > >> >
> > >> > > Hi Zakelly,
> > >> > >
> > >> > > Thanks for driving this topic. I have two concerns here:
> > >> > >
> > >> > >   1.  We shall not describe the configuration with its
> > implementation
> > >> for
> > >> > > 'execution.checkpointing.local-copy.*' options, for hashmap
> > >> > state-backend,
> > >> > > it would write two streams and for Rocksdb state-backend, it would
> > use
> > >> > > hard-link for backup. Thus, I think
> > >> > > 'execution.checkpointing.local-backup.*' looks better.
> > >> > >   2.  What does the 'execution.checkpointing.data-inline-threshold'
> > >> mean?
> > >> > > It seems not so easy to understand.
> > >> > >
> > >> > > Best
> > >> > > Yun Tang
> > >> > > ________________________________
> > >> > > From: Piotr Nowojski <pnowoj...@apache.org>
> > >> > > Sent: Thursday, January 4, 2024 22:37
> > >> > > To: dev@flink.apache.org <dev@flink.apache.org>
> > >> > > Subject: Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing &
> > >> > > Recovery Configuration
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > Thanks for trying to clean this up! I don't have strong opinions on
> > >> the
> > >> > > topics discussed here, so generally speaking +1 from my side!
> > >> > >
> > >> > > Best,
> > >> > > Piotrek
> > >> > >
> > >> > > śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a):
> > >> > >
> > >> > > > Thanks for the feedback!
> > >> > > >
> > >> > > > Using the `execution.checkpointing.incremental.enabled`,
> > >> > > > and enabling it by default sounds good to me.
> > >> > > >
> > >> > > > Best,
> > >> > > > Rui
> > >> > > >
> > >> > > > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan <
> > zakelly....@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Rui,
> > >> > > > >
> > >> > > > > Thanks for your comments!
> > >> > > > >
> > >> > > > > IMO, given that the state backend can be plugably loaded (as you
> > >> can
> > >> > > > > specify a state backend factory), I prefer not providing state
> > >> > backend
> > >> > > > > specified options in the framework.
> > >> > > > >
> > >> > > > > Secondly, the incremental checkpoint is actually a sharing file
> > >> > > strategy
> > >> > > > > across checkpoints, which means the state backend *could* reuse
> > >> files
> > >> > > > from
> > >> > > > > previous cp but not *must* do so. When the state backend could
> > not
> > >> > > reuse
> > >> > > > > the files, it is reasonable to fallback to a full checkpoint.
> > >> > > > >
> > >> > > > > Thus, I suggest we make it `execution.checkpointing.incremental`
> > >> and
> > >> > > > enable
> > >> > > > > it by default. For those state backends not supporting this,
> > they
> > >> > > perform
> > >> > > > > full checkpoints and print a warning to inform users. Users do
> > not
> > >> > need
> > >> > > > to
> > >> > > > > pay special attention to different options to control this
> > across
> > >> > > > different
> > >> > > > > state backends. This is more user-friendly in my opinion. WDYT?
> > >> > > > >
> > >> > > > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com>
> > >> > wrote:
> > >> > > > >
> > >> > > > > > Hi Zakelly,
> > >> > > > > >
> > >> > > > > > I'm not sure whether we could add the state backend type in
> > the
> > >> > > > > > new key name of state.backend.incremental. It means we use
> > >> > > > > > `execution.checkpointing.rocksdb-incremental` or
> > >> > > > > > `execution.checkpointing.rocksdb-incremental.enabled`.
> > >> > > > > >
> > >> > > > > > So far, state.backend.incremental only works for rocksdb state
> > >> > > backend.
> > >> > > > > > And this feature or optimization is very valuable and huge for
> > >> > large
> > >> > > > > > state flink jobs. I believe it's enabled for most production
> > >> flink
> > >> > > jobs
> > >> > > > > > with large rocksdb state.
> > >> > > > > >
> > >> > > > > > If this option isn't generic for all state backend types, I
> > >> guess
> > >> > we
> > >> > > > > > can enable
> > `execution.checkpointing.rocksdb-incremental.enabled`
> > >> > > > > > by default in Flink 2.0.
> > >> > > > > >
> > >> > > > > > But if it works for all state backends, it's hard to enable it
> > >> by
> > >> > > > > default.
> > >> > > > > > Enabling great and valuable features or improvements are
> > useful
> > >> > > > > > for users, especially a lot of new flink users. Out-of-the-box
> > >> > > options
> > >> > > > > > are good for users.
> > >> > > > > >
> > >> > > > > > WDYT?
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Rui
> > >> > > > > >
> > >> > > > > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan <
> > >> zakelly....@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi everyone,
> > >> > > > > > >
> > >> > > > > > > Thanks all for your comments!
> > >> > > > > > >
> > >> > > > > > > As many of you have questions about the names for boolean
> > >> > options,
> > >> > > I
> > >> > > > > > > suggest we make a naming rule for them. For now I could
> > think
> > >> of
> > >> > > > three
> > >> > > > > > > options:
> > >> > > > > > >
> > >> > > > > > > Option 1: Use enumeration options if possible. But this may
> > >> cause
> > >> > > > some
> > >> > > > > > name
> > >> > > > > > > collisions or confusion as we discussed and we should unify
> > >> the
> > >> > > > > statement
> > >> > > > > > > everywhere.
> > >> > > > > > > Option 2: Use boolean options and add 'enabled' as the
> > suffix.
> > >> > > > > > > Option 3: Use boolean options and ONLY add 'enabled' when
> > >> there
> > >> > are
> > >> > > > > more
> > >> > > > > > > detailed configurations under the same prefix, to prevent
> > one
> > >> > name
> > >> > > > from
> > >> > > > > > > serving as a prefix to another.
> > >> > > > > > >
> > >> > > > > > > I am slightly inclined to Option 3, since it is more in line
> > >> with
> > >> > > > > current
> > >> > > > > > > practice and friendly for existing users. Also It reduces
> > the
> > >> > > length
> > >> > > > of
> > >> > > > > > > configuration names as much as possible. I really want to
> > hear
> > >> > your
> > >> > > > > > > opinions.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > @Xuannan
> > >> > > > > > >
> > >> > > > > > > I agree with your comments 1 and 3.
> > >> > > > > > >
> > >> > > > > > > For 2, If we decide to change the name, maybe
> > >> > > > > > > `execution.checkpointing.parallel-cleaner` is better? And as
> > >> for
> > >> > > > > whether
> > >> > > > > > to
> > >> > > > > > > add 'enabled' I suggest we discuss the rule above. WDYT?
> > >> > > > > > > Thanks!
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Zakelly
> > >> > > > > > >
> > >> > > > > > > On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su <
> > >> > suxuanna...@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Zakelly,
> > >> > > > > > > >
> > >> > > > > > > > Thanks for driving this! The organization of the
> > >> configuration
> > >> > > > option
> > >> > > > > > > > in the FLIP looks much cleaner and easier to understand.
> > +1
> > >> to
> > >> > > the
> > >> > > > > > > > FLIP.
> > >> > > > > > > >
> > >> > > > > > > > Just some questions from me.
> > >> > > > > > > >
> > >> > > > > > > > 1. I think the change to the ConfigOptions should be put
> > in
> > >> the
> > >> > > > > > > > `Public Interface` section, instead of `Proposed Changed`,
> > >> as
> > >> > > those
> > >> > > > > > > > configuration options are public interface.
> > >> > > > > > > >
> > >> > > > > > > > 2. The key `state.checkpoint.cleaner.parallel-mode` seems
> > >> > > > confusing.
> > >> > > > > > > > It feels like it is used to choose different modes. In
> > >> fact, it
> > >> > > is
> > >> > > > a
> > >> > > > > > > > boolean flag to indicate whether to enable parallel clean.
> > >> How
> > >> > > > about
> > >> > > > > > > > making it
> > `state.checkpoint.cleaner.parallel-mode.enabled`?
> > >> > > > > > > >
> > >> > > > > > > > 3. The `execution.checkpointing.write-buffer` may better
> > be
> > >> > > > > > > > `execution.checkpointing.write-buffer-size` so that we
> > know
> > >> it
> > >> > is
> > >> > > > > > > > configuring the size of the buffer.
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Xuannan
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Dec 27, 2023 at 7:17 PM Yanfei Lei <
> > >> > fredia...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > Hi Zakelly,
> > >> > > > > > > > >
> > >> > > > > > > > > > Considering the name occupation, how about naming it
> > as
> > >> > > > > > > > `execution.checkpointing.type`?
> > >> > > > > > > > >
> > >> > > > > > > > > `Checkpoint Type`[1,2] is used to describe
> > >> aligned/unaligned
> > >> > > > > > > > > checkpoint, I am inclined to make a choice between
> > >> > > > > > > > > `execution.checkpointing.incremental` and
> > >> > > > > > > > > `execution.checkpointing.incremental.enabled`.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > [1]
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/monitoring/checkpoint_monitoring/
> > >> > > > > > > > > [2]
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html#L27
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > Best,
> > >> > > > > > > > > Yanfei
> > >> > > > > > > > >
> > >> > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三
> > >> 14:41写道:
> > >> > > > > > > > > >
> > >> > > > > > > > > > Hi Lijie,
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks for the reminder! I missed this.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Considering the name occupation, how about naming it
> > as
> > >> > > > > > > > > > `execution.checkpointing.type`?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Actually I think the current
> > >> `execution.checkpointing.mode`
> > >> > > is
> > >> > > > > > > > confusing in
> > >> > > > > > > > > > some ways, maybe
> > >> `execution.checkpointing.data-consistency`
> > >> > > is
> > >> > > > > > > better.
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > Best,
> > >> > > > > > > > > > Zakelly
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Dec 27, 2023 at 12:59 PM Lijie Wang <
> > >> > > > > > > wangdachui9...@gmail.com>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi Zakelly,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > >> I'm wondering if
> > >> > `execution.checkpointing.savepoint-dir`
> > >> > > > > would
> > >> > > > > > > be
> > >> > > > > > > > > > > better.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > `execution.checkpointing.dir` and
> > >> > > > > > > > `execution.checkpointing.savepoint-dir`
> > >> > > > > > > > > > > are also fine for me.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > >> So I think an enumeration option
> > >> > > > > > `execution.checkpointing.mode`
> > >> > > > > > > > which
> > >> > > > > > > > > > > can be 'full' (default) or 'incremental' would be
> > >> better
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I agree with using an enumeration option. But
> > >> currently
> > >> > > there
> > >> > > > > is
> > >> > > > > > > > already a
> > >> > > > > > > > > > > configuration option called
> > >> > `execution.checkpointing.mode`,
> > >> > > > > which
> > >> > > > > > > is
> > >> > > > > > > > used
> > >> > > > > > > > > > > to choose EXACTLY_ONCE or AT_LEAST_ONCE. Maybe we
> > >> need to
> > >> > > use
> > >> > > > > > > > another name
> > >> > > > > > > > > > > or merge these two options.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Best,
> > >> > > > > > > > > > > Lijie
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三
> > >> > > 11:43写道:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Hi everyone,
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Thanks all for your comments!
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > @Yanfei
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > 1. For some state backends that do not support
> > >> > > > incremental
> > >> > > > > > > > checkpoint,
> > >> > > > > > > > > > > > > how does the
> > >> > execution.checkpointing.incrementaloption
> > >> > > > take
> > >> > > > > > > > effect? Or
> > >> > > > > > > > > > > > > is it better to put incremental under
> > >> > > > > > > > state.backend.xxx.incremental?
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > I'd rather not put the option for incremental
> > >> > checkpoint
> > >> > > > > under
> > >> > > > > > > the
> > >> > > > > > > > > > > > 'state.backend', since it is more about the
> > >> > checkpointing
> > >> > > > > > instead
> > >> > > > > > > > of
> > >> > > > > > > > > > > state
> > >> > > > > > > > > > > > accessing. Of course, the state backend may not
> > >> > > necessarily
> > >> > > > > do
> > >> > > > > > > > > > > incremental
> > >> > > > > > > > > > > > checkpoint as requested. If the state backend is
> > not
> > >> > > > capable
> > >> > > > > of
> > >> > > > > > > > taking
> > >> > > > > > > > > > > > incremental cp, it is better to fallback to the
> > full
> > >> > cp.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > 2. I'm a little worried that putting all
> > >> configurations
> > >> > > > into
> > >> > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce
> > >> some
> > >> > > > > > dependency
> > >> > > > > > > > > > > > > problems. Some options would be used by
> > >> flink-runtime
> > >> > > > > module,
> > >> > > > > > > but
> > >> > > > > > > > > > > > > flink-runtime should not depend on
> > >> > > flink-streaming-java.
> > >> > > > > e.g.
> > >> > > > > > > > > > > > > FLINK-28286[1].
> > >> > > > > > > > > > > > > So, I prefer to move configurations to
> > >> > > > > > `CheckpointingOptions`,
> > >> > > > > > > > WDYT?
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Yes, that's a very good point.  Moving to
> > >> > > > > > > > > > > > `CheckpointingOptions`(flink-core) makes sense.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > @Lijie
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > How about
> > >> > > > > > > > > > > > > state.savepoints.dir ->
> > >> > > > > execution.checkpointing.savepoint.dir
> > >> > > > > > > > > > > > > state.checkpoints.dir ->
> > >> > > > > > execution.checkpointing.checkpoint.dir
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Actually, I think the `checkpointing.checkpoint`
> > may
> > >> > > cause
> > >> > > > > some
> > >> > > > > > > > > > > confusion.
> > >> > > > > > > > > > > > But I'm ok if others agree.
> > >> > > > > > > > > > > > I'm wondering if
> > >> > `execution.checkpointing.savepoint-dir`
> > >> > > > > would
> > >> > > > > > be
> > >> > > > > > > > better.
> > >> > > > > > > > > > > > WDYT?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > 2. We changed the
> > >> execution.checkpointing.local-copy'
> > >> > to
> > >> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
> > >> Should
> > >> > we
> > >> > > > > also
> > >> > > > > > > add
> > >> > > > > > > > > > > > "enabled"
> > >> > > > > > > > > > > > > suffix for other boolean type configuration
> > >> options ?
> > >> > > For
> > >> > > > > > > > example,
> > >> > > > > > > > > > > > > execution.checkpointing.incremental ->
> > >> > > > > > > > > > > > > execution.checkpointing.incremental.enabled
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Actually, the incremental cp is something like
> > >> > choosing a
> > >> > > > > mode
> > >> > > > > > > for
> > >> > > > > > > > doing
> > >> > > > > > > > > > > > checkpoint instead of enabling a function. So I
> > >> think
> > >> > an
> > >> > > > > > > > enumeration
> > >> > > > > > > > > > > option
> > >> > > > > > > > > > > > `execution.checkpointing.mode` which can be 'full'
> > >> > > > (default)
> > >> > > > > or
> > >> > > > > > > > > > > > 'incremental' would be better, WDYT?
> > >> > > > > > > > > > > > And @Rui Fan @Yanfei What do you think about this?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie Wang <
> > >> > > > > > > > wangdachui9...@gmail.com>
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > > Hi Zakelly,
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Thanks for driving the discussion.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > 1.
> > >> > > > > > > > > > > > > >> But I'm not so sure since there is only one
> > >> > > > > > > savepoint-related
> > >> > > > > > > > > > > option.
> > >> > > > > > > > > > > > > Maybe someone else could share some thoughts
> > here.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > How about
> > >> > > > > > > > > > > > > state.savepoints.dir ->
> > >> > > > > execution.checkpointing.savepoint.dir
> > >> > > > > > > > > > > > > state.checkpoints.dir ->
> > >> > > > > > execution.checkpointing.checkpoint.dir
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > 2. We changed the
> > >> execution.checkpointing.local-copy'
> > >> > > to
> > >> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
> > >> Should
> > >> > we
> > >> > > > > also
> > >> > > > > > > add
> > >> > > > > > > > > > > > "enabled"
> > >> > > > > > > > > > > > > suffix for other boolean type configuration
> > >> options ?
> > >> > > For
> > >> > > > > > > > example,
> > >> > > > > > > > > > > > > execution.checkpointing.incremental ->
> > >> > > > > > > > > > > > > execution.checkpointing.incremental.enabled
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > In this way, the naming style of configuration
> > >> > options
> > >> > > is
> > >> > > > > > > > unified, and
> > >> > > > > > > > > > > it
> > >> > > > > > > > > > > > > can avoid potential similar problems (for
> > >> example, we
> > >> > > may
> > >> > > > > > need
> > >> > > > > > > > to add
> > >> > > > > > > > > > > > more
> > >> > > > > > > > > > > > > options for incremental checkpoint in the
> > future).
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > Lijie
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Yanfei Lei <fredia...@gmail.com> 于2023年12月26日周二
> > >> > > 12:05写道:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Hi Zakelly,
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Thank you for creating the FLIP and starting
> > the
> > >> > > > > > discussion.
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > The current arrangement of these options is
> > >> indeed
> > >> > > > > somewhat
> > >> > > > > > > > > > > haphazard,
> > >> > > > > > > > > > > > > > and the new arrangement looks much better. I
> > >> have
> > >> > > some
> > >> > > > > > > > questions
> > >> > > > > > > > > > > about
> > >> > > > > > > > > > > > > > the arrangement of some new configuration
> > >> options:
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > 1. For some state backends that do not support
> > >> > > > > incremental
> > >> > > > > > > > > > > checkpoint,
> > >> > > > > > > > > > > > > > how does the
> > >> > > execution.checkpointing.incrementaloption
> > >> > > > > take
> > >> > > > > > > > effect?
> > >> > > > > > > > > > > Or
> > >> > > > > > > > > > > > > > is it better to put incremental under
> > >> > > > > > > > state.backend.xxx.incremental?
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > 2. I'm a little worried that putting all
> > >> > > configurations
> > >> > > > > > into
> > >> > > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce
> > >> some
> > >> > > > > > > dependency
> > >> > > > > > > > > > > > > > problems. Some options would be used by
> > >> > flink-runtime
> > >> > > > > > module,
> > >> > > > > > > > but
> > >> > > > > > > > > > > > > > flink-runtime should not depend on
> > >> > > > flink-streaming-java.
> > >> > > > > > e.g.
> > >> > > > > > > > > > > > > > FLINK-28286[1].
> > >> > > > > > > > > > > > > > So, I prefer to move configurations to
> > >> > > > > > > `CheckpointingOptions`,
> > >> > > > > > > > WDYT?
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > [1]
> > >> > > https://issues.apache.org/jira/browse/FLINK-28286
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > --
> > >> > > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > > Yanfei
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com>
> > >> 于2023年12月25日周一
> > >> > > > > > 21:14写道:
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Hi Rui Fan and Junrui,
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Thanks for the reminder! I agree to change
> > the
> > >> > > > > > > > > > > > > > > 'execution.checkpointing.local-copy' to
> > >> > > > > > > > > > > > > > >
> > 'execution.checkpointing.local-copy.enabled'.
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > And for other suggestions Rui proposed:
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > 1. How about
> > >> execution.checkpointing.storage.type
> > >> > > > > instead
> > >> > > > > > > > > > > > > > > > of execution.checkpointing.storage?
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Ah, I missed something here. Actually I
> > >> suggest
> > >> > we
> > >> > > > > could
> > >> > > > > > > > merge the
> > >> > > > > > > > > > > > > > current
> > >> > > > > > > > > > > > > > > 'state.checkpoints.dir' and
> > >> > > > 'state.checkpoint-storage'
> > >> > > > > > into
> > >> > > > > > > > one URI
> > >> > > > > > > > > > > > > > > configuration named
> > >> > 'execution.checkpointing.dir'.
> > >> > > > > WDYT?
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir is
> > a
> > >> > > little
> > >> > > > > > weird.
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Yes, I think it is better to make
> > 'savepoint'
> > >> and
> > >> > > > > > > > 'checkpoint' the
> > >> > > > > > > > > > > > same
> > >> > > > > > > > > > > > > > > level. But I'm not so sure since there is
> > only
> > >> > one
> > >> > > > > > > > > > > savepoint-related
> > >> > > > > > > > > > > > > > > option. Maybe someone else could share some
> > >> > > thoughts
> > >> > > > > > here.
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > 4. How about execution.recovery.claim-mode
> > >> > instead
> > >> > > of
> > >> > > > > > > > > > > > > > > > execution.recovery.mode?
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >  Agreed. That's more accurate.
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Many thanks for your suggestions!
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > > > Zakelly
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM Junrui Lee <
> > >> > > > > > > > jrlee....@gmail.com>
> > >> > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Hi Zakelly,
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Thanks for driving this. I agree that the
> > >> > > proposed
> > >> > > > > > > > restructuring
> > >> > > > > > > > > > > of
> > >> > > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > configuration options is largely positive.
> > >> It
> > >> > > will
> > >> > > > > make
> > >> > > > > > > > > > > > understanding
> > >> > > > > > > > > > > > > > and
> > >> > > > > > > > > > > > > > > > working with Flink configurations more
> > >> > intuitive.
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Most of the proposed changes look great.
> > >> Just a
> > >> > > > > > heads-up,
> > >> > > > > > > > as Rui
> > >> > > > > > > > > > > > Fan
> > >> > > > > > > > > > > > > > > > mentioned, Flink currently requires that
> > no
> > >> > > > > > > configOption's
> > >> > > > > > > > key be
> > >> > > > > > > > > > > > the
> > >> > > > > > > > > > > > > > > > prefix of another to avoid issues when we
> > >> > > > eventually
> > >> > > > > > > adopt
> > >> > > > > > > > a
> > >> > > > > > > > > > > > standard
> > >> > > > > > > > > > > > > > YAML
> > >> > > > > > > > > > > > > > > > parser, as detailed in FLINK-29372 (
> > >> > > > > > > > > > > > > > > >
> > >> > > https://issues.apache.org/jira/browse/FLINK-29372
> > >> > > > ).
> > >> > > > > > > > Therefore,
> > >> > > > > > > > > > > > it's
> > >> > > > > > > > > > > > > > better
> > >> > > > > > > > > > > > > > > > to change the key
> > >> > > > > 'execution.checkpointing.local-copy'
> > >> > > > > > > > because it
> > >> > > > > > > > > > > > > > serves as
> > >> > > > > > > > > > > > > > > > a prefix to the key
> > >> > > > > > > > 'execution.checkpointing.local-copy.dir'.
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Best regards,
> > >> > > > > > > > > > > > > > > > Junrui
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > Rui Fan <1996fan...@gmail.com>
> > >> 于2023年12月25日周一
> > >> > > > > 19:11写道:
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Hi Zakelly,
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Thank you for driving this proposal!
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Overall good for me. I have some
> > questions
> > >> > > about
> > >> > > > > > these
> > >> > > > > > > > names.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > 1. How about
> > >> > > execution.checkpointing.storage.type
> > >> > > > > > > > instead of
> > >> > > > > > > > > > > > > > > > > execution.checkpointing.storage?
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > It's similar to state.backend.type.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > 2. How about
> > >> > > > > > execution.checkpointing.local-copy.enabled
> > >> > > > > > > > instead
> > >> > > > > > > > > > > > of
> > >> > > > > > > > > > > > > > > > > execution.checkpointing.local-copy?
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > You added a new option:
> > >> > > > > > > > execution.checkpointing.local-copy.dir.
> > >> > > > > > > > > > > > > > > > > IIUC, one option name shouldn't be the
> > >> prefix
> > >> > > of
> > >> > > > > > other
> > >> > > > > > > > options.
> > >> > > > > > > > > > > > > > > > > If you add a new option
> > >> > > > > > > > execution.checkpointing.local-copy,
> > >> > > > > > > > > > > > > > > > > flink CI will fail directly.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir
> > >> is a
> > >> > > > > little
> > >> > > > > > > > weird.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > For old options: state.savepoints.dir
> > and
> > >> > > > > > > > > > > state.checkpoints.dir,
> > >> > > > > > > > > > > > > > > > > the savepoint and checkpoint are the
> > same
> > >> > > level.
> > >> > > > It
> > >> > > > > > > means
> > >> > > > > > > > > > > > > > > > > it's a checkpoint or savepoint.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > The new option
> > >> execution.checkpointing.dir is
> > >> > > > fine
> > >> > > > > > for
> > >> > > > > > > > me.
> > >> > > > > > > > > > > > > > > > > However,
> > >> > execution.checkpointing.savepoint.dir
> > >> > > > is a
> > >> > > > > > > > little
> > >> > > > > > > > > > > weird.
> > >> > > > > > > > > > > > > > > > > I don't know which name is better now.
> > >> Let us
> > >> > > > think
> > >> > > > > > > > about it
> > >> > > > > > > > > > > > more.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > 4. How about
> > execution.recovery.claim-mode
> > >> > > > instead
> > >> > > > > of
> > >> > > > > > > > > > > > > > > > > execution.recovery.mode?
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > The meaning of mode is too broad. The
> > >> > > claim-mode
> > >> > > > > may
> > >> > > > > > > > > > > > > > > > > be more accurate for users.
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > WDYT?
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > > > > > Rui
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 5:14 PM Zakelly
> > >> Lan <
> > >> > > > > > > > > > > > zakelly....@gmail.com
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Hi devs,
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > I'd like to start a discussion on
> > >> FLIP-406:
> > >> > > > > > > Reorganize
> > >> > > > > > > > State
> > >> > > > > > > > > > > &
> > >> > > > > > > > > > > > > > > > > > Checkpointing & Recovery
> > >> Configuration[1].
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Currently, the configuration options
> > >> > > pertaining
> > >> > > > > to
> > >> > > > > > > > > > > > checkpointing,
> > >> > > > > > > > > > > > > > > > > recovery,
> > >> > > > > > > > > > > > > > > > > > and state management are primarily
> > >> grouped
> > >> > > > under
> > >> > > > > > the
> > >> > > > > > > > > > > following
> > >> > > > > > > > > > > > > > > > prefixes:
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > >    - state.backend.* : configurations
> > >> > related
> > >> > > > to
> > >> > > > > > > state
> > >> > > > > > > > > > > > accessing
> > >> > > > > > > > > > > > > > and
> > >> > > > > > > > > > > > > > > > > >    checkpointing, as well as specific
> > >> > options
> > >> > > > for
> > >> > > > > > > > individual
> > >> > > > > > > > > > > > > state
> > >> > > > > > > > > > > > > > > > > backends
> > >> > > > > > > > > > > > > > > > > >    - execution.checkpointing.* :
> > >> > > configurations
> > >> > > > > > > > associated
> > >> > > > > > > > > > > with
> > >> > > > > > > > > > > > > > > > > checkpoint
> > >> > > > > > > > > > > > > > > > > >    execution and recovery
> > >> > > > > > > > > > > > > > > > > >    - execution.savepoint.*:
> > >> configurations
> > >> > > for
> > >> > > > > > > > recovery from
> > >> > > > > > > > > > > > > > savepoint
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > In addition, there are several
> > >> individual
> > >> > > > options
> > >> > > > > > > such
> > >> > > > > > > > as '
> > >> > > > > > > > > > > > > > > > > > *state.checkpoint-storage*' and
> > >> > > > > > > > '*state.checkpoints.dir*'
> > >> > > > > > > > > > > that
> > >> > > > > > > > > > > > > fall
> > >> > > > > > > > > > > > > > > > > outside
> > >> > > > > > > > > > > > > > > > > > of these prefixes. The current
> > >> arrangement
> > >> > of
> > >> > > > > these
> > >> > > > > > > > options,
> > >> > > > > > > > > > > > > which
> > >> > > > > > > > > > > > > > span
> > >> > > > > > > > > > > > > > > > > > multiple modules, is somewhat
> > haphazard
> > >> and
> > >> > > > > lacks a
> > >> > > > > > > > > > > systematic
> > >> > > > > > > > > > > > > > > > structure.
> > >> > > > > > > > > > > > > > > > > > For example, the options under the
> > >> > > > > > > > '*CheckpointingOptions*'
> > >> > > > > > > > > > > > and '
> > >> > > > > > > > > > > > > > > > > > *ExecutionCheckpointingOptions*' are
> > >> > related
> > >> > > > and
> > >> > > > > > have
> > >> > > > > > > > no
> > >> > > > > > > > > > > clear
> > >> > > > > > > > > > > > > > > > boundaries
> > >> > > > > > > > > > > > > > > > > > from the user's perspective, but there
> > >> is
> > >> > no
> > >> > > > > > unified
> > >> > > > > > > > prefix
> > >> > > > > > > > > > > for
> > >> > > > > > > > > > > > > > them.
> > >> > > > > > > > > > > > > > > > > With
> > >> > > > > > > > > > > > > > > > > > the upcoming release of Flink 2.0, we
> > >> have
> > >> > an
> > >> > > > > > > excellent
> > >> > > > > > > > > > > > > > opportunity to
> > >> > > > > > > > > > > > > > > > > > overhaul and restructure the
> > >> configurations
> > >> > > > > related
> > >> > > > > > > to
> > >> > > > > > > > > > > > > > checkpointing,
> > >> > > > > > > > > > > > > > > > > > recovery, and state management. This
> > >> FLIP
> > >> > > > > proposes
> > >> > > > > > to
> > >> > > > > > > > > > > > reorganize
> > >> > > > > > > > > > > > > > these
> > >> > > > > > > > > > > > > > > > > > settings, making it more coherent by
> > >> > module,
> > >> > > > > which
> > >> > > > > > > > would
> > >> > > > > > > > > > > > > > significantly
> > >> > > > > > > > > > > > > > > > > > lower the barriers for understanding
> > and
> > >> > > reduce
> > >> > > > > the
> > >> > > > > > > > > > > development
> > >> > > > > > > > > > > > > > costs
> > >> > > > > > > > > > > > > > > > > > moving forward.
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Looking forward to hearing from you!
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > [1]
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > > > > > > Zakelly
> > >> > > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> Best,
> > >> Hangxiang.
> > >>
> > >
> >
>
>
> --
> Best,
> Hangxiang.

Reply via email to