Hi Zakelly, I am fine with either Option 2 or Option 3. I think the naming in Option 2 makes it clear that it is a boolean configuration. However, most of the currently available boolean configurations do not use "enable" as a suffix. Therefore, Option 3 looks good to me as it follows the current practice.
Best regards, Xuannan On Thu, Jan 11, 2024 at 9:50 AM Hangxiang Yu <master...@gmail.com> wrote: > > > > > That's a very good point. I realize that the word 'recovery' means way too > > many things. So I suggest picking a more specific word here, how about > > 'execution.state-recovery.*' ? Checkpointing and state recovery are > > corresponding terms and won't make ambiguity. > > > > This makes the configuration clearer to me. We could focus on the > `state-recovery` at first. > > I think we could create another FLIP for the deprecation of LEGACY mode. > > > > LGTM, Let's create a new FLIP to do this. > > IIUC, there is no clear ownership of the local copy files from the previous > > job and it's better to define one. This needs more discussion so we could > > create another thread for this. WDYT? > > > > Yeah, I have created a new ticket FLINK-34032 to track and discuss this. > > On Wed, Jan 10, 2024 at 6:31 PM Zakelly Lan <zakelly....@gmail.com> wrote: > > > Hi everyone, > > > > It seems we still don't have a consensus on the rules for boolean type > > options. Let me recap the alternatives we have: > > > > Option 1: Use enumeration options instead if possible. But this may cause > > some name collisions or confusion as we discussed and we should unify the > > statement everywhere. > > Option 2: Use boolean options and add 'enabled' as the suffix. > > Option 3: Use boolean options and ONLY add 'enabled' when there are more > > detailed configurations under the same prefix, to prevent one name from > > serving as a prefix to another. > > > > I am inclined to Option 3, since it is more in line with current practice > > and friendly for existing users. Also It reduces the length of > > configuration names as much as possible. > > > > Looking forward to your opinions! Thanks! > > > > > > Best, > > Zakelly > > > > On Wed, Jan 10, 2024 at 3:30 PM Zakelly Lan <zakelly....@gmail.com> wrote: > > > > > Hi Hangxiang, > > > > > > Thanks for your suggestions! > > > > > > 1. Could execution.recovery also contain some other behaviors about > > >> recovery ? e.g. restart-strategy. > > > > > > > > > That's a very good point. I realize that the word 'recovery' means way > > too > > > many things. So I suggest picking a more specific word here, how about > > > 'execution.state-recovery.*' ? Checkpointing and state recovery are > > > corresponding terms and won't make ambiguity. > > > > > > 2. Could we also remove some legacy configuration value ? e.g. LEGACY > > Mode > > >> for execution.savepoint-restore-mode/execution.recovery.claim-mode. > > > > > > > > > I think we could create another FLIP for the deprecation of LEGACY mode. > > > > > > > > >> 3. Could the local checkpoint be cleaned > > >> if execution.checkpointing.local-copy.enabled is true and > > >> execution.recovery.from-local is false ? I found it's also an issue if > > >> current local-recovery from enabled to disabled. Maybe another ticket is > > >> needed. > > > > > > > > > IIUC, there is no clear ownership of the local copy files from the > > > previous job and it's better to define one. This needs more discussion so > > > we could create another thread for this. WDYT? > > > > > > > > > Best, > > > Zakelly > > > > > > On Tue, Jan 9, 2024 at 11:23 AM Hangxiang Yu <master...@gmail.com> > > wrote: > > > > > >> Hi, Zakelly. > > >> Thanks for driving this. Overall LGTM as we discussed offline. > > >> > > >> Some comments/suggestions just came to mind: > > >> 1. Could execution.recovery also contain some other behaviors about > > >> recovery ? e.g. restart-strategy. > > >> 2. Could we also remove some legacy configuration value ? e.g. LEGACY > > Mode > > >> for execution.savepoint-restore-mode/execution.recovery.claim-mode. > > >> 3. Could the local checkpoint be cleaned > > >> if execution.checkpointing.local-copy.enabled is true and > > >> execution.recovery.from-local is false ? I found it's also an issue if > > >> current local-recovery from enabled to disabled. Maybe another ticket is > > >> needed. > > >> 4. +1 for enabling execution.checkpointing.incremental by default which > > is > > >> basically default configuration in our production environment. > > >> > > >> > > >> On Mon, Jan 8, 2024 at 6:06 PM Zakelly Lan <zakelly....@gmail.com> > > wrote: > > >> > > >> > Hi Yun, > > >> > > > >> > Thanks for your comments! > > >> > > > >> > 1. We shall not describe the configuration with its implementation > > for > > >> > > 'execution.checkpointing.local-copy.*' options, for hashmap > > >> > state-backend, > > >> > > it would write two streams and for Rocksdb state-backend, it would > > use > > >> > > hard-link for backup. Thus, I think > > >> > > 'execution.checkpointing.local-backup.*' looks better. > > >> > > > >> > I agreed that we'd better name the option in user's perspective > > instead > > >> of > > >> > the implementation, thus I name it as a copy of the checkpoint in the > > >> > local disk, regardless of the way of generating it. The word 'backup' > > is > > >> > also suitable for this case, so I agree to change to > > >> > 'execution.checkpointing.local-backup.*' if no one objects. > > >> > > > >> > 2. What does the 'execution.checkpointing.data-inline-threshold' > > >> mean? It > > >> > > seems not so easy to understand. > > >> > > > >> > The 'execution.checkpointing.data-inline-threshold' (original one as > > >> > 'state.storage.fs.memory-threshold') stands for the size threshold > > below > > >> > which state chunks will store inline with the metadata, thus I call it > > >> > 'data-inline-threshold'. > > >> > > > >> > > > >> > Best, > > >> > Zakelly > > >> > > > >> > On Mon, Jan 8, 2024 at 10:09 AM Yun Tang <myas...@live.com> wrote: > > >> > > > >> > > Hi Zakelly, > > >> > > > > >> > > Thanks for driving this topic. I have two concerns here: > > >> > > > > >> > > 1. We shall not describe the configuration with its > > implementation > > >> for > > >> > > 'execution.checkpointing.local-copy.*' options, for hashmap > > >> > state-backend, > > >> > > it would write two streams and for Rocksdb state-backend, it would > > use > > >> > > hard-link for backup. Thus, I think > > >> > > 'execution.checkpointing.local-backup.*' looks better. > > >> > > 2. What does the 'execution.checkpointing.data-inline-threshold' > > >> mean? > > >> > > It seems not so easy to understand. > > >> > > > > >> > > Best > > >> > > Yun Tang > > >> > > ________________________________ > > >> > > From: Piotr Nowojski <pnowoj...@apache.org> > > >> > > Sent: Thursday, January 4, 2024 22:37 > > >> > > To: dev@flink.apache.org <dev@flink.apache.org> > > >> > > Subject: Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing & > > >> > > Recovery Configuration > > >> > > > > >> > > Hi, > > >> > > > > >> > > Thanks for trying to clean this up! I don't have strong opinions on > > >> the > > >> > > topics discussed here, so generally speaking +1 from my side! > > >> > > > > >> > > Best, > > >> > > Piotrek > > >> > > > > >> > > śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a): > > >> > > > > >> > > > Thanks for the feedback! > > >> > > > > > >> > > > Using the `execution.checkpointing.incremental.enabled`, > > >> > > > and enabling it by default sounds good to me. > > >> > > > > > >> > > > Best, > > >> > > > Rui > > >> > > > > > >> > > > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan < > > zakelly....@gmail.com> > > >> > > wrote: > > >> > > > > > >> > > > > Hi Rui, > > >> > > > > > > >> > > > > Thanks for your comments! > > >> > > > > > > >> > > > > IMO, given that the state backend can be plugably loaded (as you > > >> can > > >> > > > > specify a state backend factory), I prefer not providing state > > >> > backend > > >> > > > > specified options in the framework. > > >> > > > > > > >> > > > > Secondly, the incremental checkpoint is actually a sharing file > > >> > > strategy > > >> > > > > across checkpoints, which means the state backend *could* reuse > > >> files > > >> > > > from > > >> > > > > previous cp but not *must* do so. When the state backend could > > not > > >> > > reuse > > >> > > > > the files, it is reasonable to fallback to a full checkpoint. > > >> > > > > > > >> > > > > Thus, I suggest we make it `execution.checkpointing.incremental` > > >> and > > >> > > > enable > > >> > > > > it by default. For those state backends not supporting this, > > they > > >> > > perform > > >> > > > > full checkpoints and print a warning to inform users. Users do > > not > > >> > need > > >> > > > to > > >> > > > > pay special attention to different options to control this > > across > > >> > > > different > > >> > > > > state backends. This is more user-friendly in my opinion. WDYT? > > >> > > > > > > >> > > > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com> > > >> > wrote: > > >> > > > > > > >> > > > > > Hi Zakelly, > > >> > > > > > > > >> > > > > > I'm not sure whether we could add the state backend type in > > the > > >> > > > > > new key name of state.backend.incremental. It means we use > > >> > > > > > `execution.checkpointing.rocksdb-incremental` or > > >> > > > > > `execution.checkpointing.rocksdb-incremental.enabled`. > > >> > > > > > > > >> > > > > > So far, state.backend.incremental only works for rocksdb state > > >> > > backend. > > >> > > > > > And this feature or optimization is very valuable and huge for > > >> > large > > >> > > > > > state flink jobs. I believe it's enabled for most production > > >> flink > > >> > > jobs > > >> > > > > > with large rocksdb state. > > >> > > > > > > > >> > > > > > If this option isn't generic for all state backend types, I > > >> guess > > >> > we > > >> > > > > > can enable > > `execution.checkpointing.rocksdb-incremental.enabled` > > >> > > > > > by default in Flink 2.0. > > >> > > > > > > > >> > > > > > But if it works for all state backends, it's hard to enable it > > >> by > > >> > > > > default. > > >> > > > > > Enabling great and valuable features or improvements are > > useful > > >> > > > > > for users, especially a lot of new flink users. Out-of-the-box > > >> > > options > > >> > > > > > are good for users. > > >> > > > > > > > >> > > > > > WDYT? > > >> > > > > > > > >> > > > > > Best, > > >> > > > > > Rui > > >> > > > > > > > >> > > > > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan < > > >> zakelly....@gmail.com > > >> > > > > >> > > > > wrote: > > >> > > > > > > > >> > > > > > > Hi everyone, > > >> > > > > > > > > >> > > > > > > Thanks all for your comments! > > >> > > > > > > > > >> > > > > > > As many of you have questions about the names for boolean > > >> > options, > > >> > > I > > >> > > > > > > suggest we make a naming rule for them. For now I could > > think > > >> of > > >> > > > three > > >> > > > > > > options: > > >> > > > > > > > > >> > > > > > > Option 1: Use enumeration options if possible. But this may > > >> cause > > >> > > > some > > >> > > > > > name > > >> > > > > > > collisions or confusion as we discussed and we should unify > > >> the > > >> > > > > statement > > >> > > > > > > everywhere. > > >> > > > > > > Option 2: Use boolean options and add 'enabled' as the > > suffix. > > >> > > > > > > Option 3: Use boolean options and ONLY add 'enabled' when > > >> there > > >> > are > > >> > > > > more > > >> > > > > > > detailed configurations under the same prefix, to prevent > > one > > >> > name > > >> > > > from > > >> > > > > > > serving as a prefix to another. > > >> > > > > > > > > >> > > > > > > I am slightly inclined to Option 3, since it is more in line > > >> with > > >> > > > > current > > >> > > > > > > practice and friendly for existing users. Also It reduces > > the > > >> > > length > > >> > > > of > > >> > > > > > > configuration names as much as possible. I really want to > > hear > > >> > your > > >> > > > > > > opinions. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > @Xuannan > > >> > > > > > > > > >> > > > > > > I agree with your comments 1 and 3. > > >> > > > > > > > > >> > > > > > > For 2, If we decide to change the name, maybe > > >> > > > > > > `execution.checkpointing.parallel-cleaner` is better? And as > > >> for > > >> > > > > whether > > >> > > > > > to > > >> > > > > > > add 'enabled' I suggest we discuss the rule above. WDYT? > > >> > > > > > > Thanks! > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > Best, > > >> > > > > > > Zakelly > > >> > > > > > > > > >> > > > > > > On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su < > > >> > suxuanna...@gmail.com > > >> > > > > > >> > > > > > wrote: > > >> > > > > > > > > >> > > > > > > > Hi Zakelly, > > >> > > > > > > > > > >> > > > > > > > Thanks for driving this! The organization of the > > >> configuration > > >> > > > option > > >> > > > > > > > in the FLIP looks much cleaner and easier to understand. > > +1 > > >> to > > >> > > the > > >> > > > > > > > FLIP. > > >> > > > > > > > > > >> > > > > > > > Just some questions from me. > > >> > > > > > > > > > >> > > > > > > > 1. I think the change to the ConfigOptions should be put > > in > > >> the > > >> > > > > > > > `Public Interface` section, instead of `Proposed Changed`, > > >> as > > >> > > those > > >> > > > > > > > configuration options are public interface. > > >> > > > > > > > > > >> > > > > > > > 2. The key `state.checkpoint.cleaner.parallel-mode` seems > > >> > > > confusing. > > >> > > > > > > > It feels like it is used to choose different modes. In > > >> fact, it > > >> > > is > > >> > > > a > > >> > > > > > > > boolean flag to indicate whether to enable parallel clean. > > >> How > > >> > > > about > > >> > > > > > > > making it > > `state.checkpoint.cleaner.parallel-mode.enabled`? > > >> > > > > > > > > > >> > > > > > > > 3. The `execution.checkpointing.write-buffer` may better > > be > > >> > > > > > > > `execution.checkpointing.write-buffer-size` so that we > > know > > >> it > > >> > is > > >> > > > > > > > configuring the size of the buffer. > > >> > > > > > > > > > >> > > > > > > > Best, > > >> > > > > > > > Xuannan > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > On Wed, Dec 27, 2023 at 7:17 PM Yanfei Lei < > > >> > fredia...@gmail.com> > > >> > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > >> > > > > > > > > > Considering the name occupation, how about naming it > > as > > >> > > > > > > > `execution.checkpointing.type`? > > >> > > > > > > > > > > >> > > > > > > > > `Checkpoint Type`[1,2] is used to describe > > >> aligned/unaligned > > >> > > > > > > > > checkpoint, I am inclined to make a choice between > > >> > > > > > > > > `execution.checkpointing.incremental` and > > >> > > > > > > > > `execution.checkpointing.incremental.enabled`. > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > [1] > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/monitoring/checkpoint_monitoring/ > > >> > > > > > > > > [2] > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html#L27 > > >> > > > > > > > > > > >> > > > > > > > > -- > > >> > > > > > > > > Best, > > >> > > > > > > > > Yanfei > > >> > > > > > > > > > > >> > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三 > > >> 14:41写道: > > >> > > > > > > > > > > > >> > > > > > > > > > Hi Lijie, > > >> > > > > > > > > > > > >> > > > > > > > > > Thanks for the reminder! I missed this. > > >> > > > > > > > > > > > >> > > > > > > > > > Considering the name occupation, how about naming it > > as > > >> > > > > > > > > > `execution.checkpointing.type`? > > >> > > > > > > > > > > > >> > > > > > > > > > Actually I think the current > > >> `execution.checkpointing.mode` > > >> > > is > > >> > > > > > > > confusing in > > >> > > > > > > > > > some ways, maybe > > >> `execution.checkpointing.data-consistency` > > >> > > is > > >> > > > > > > better. > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > Best, > > >> > > > > > > > > > Zakelly > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > On Wed, Dec 27, 2023 at 12:59 PM Lijie Wang < > > >> > > > > > > wangdachui9...@gmail.com> > > >> > > > > > > > > > wrote: > > >> > > > > > > > > > > > >> > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > >> > > > > > > > > > > >> I'm wondering if > > >> > `execution.checkpointing.savepoint-dir` > > >> > > > > would > > >> > > > > > > be > > >> > > > > > > > > > > better. > > >> > > > > > > > > > > > > >> > > > > > > > > > > `execution.checkpointing.dir` and > > >> > > > > > > > `execution.checkpointing.savepoint-dir` > > >> > > > > > > > > > > are also fine for me. > > >> > > > > > > > > > > > > >> > > > > > > > > > > >> So I think an enumeration option > > >> > > > > > `execution.checkpointing.mode` > > >> > > > > > > > which > > >> > > > > > > > > > > can be 'full' (default) or 'incremental' would be > > >> better > > >> > > > > > > > > > > > > >> > > > > > > > > > > I agree with using an enumeration option. But > > >> currently > > >> > > there > > >> > > > > is > > >> > > > > > > > already a > > >> > > > > > > > > > > configuration option called > > >> > `execution.checkpointing.mode`, > > >> > > > > which > > >> > > > > > > is > > >> > > > > > > > used > > >> > > > > > > > > > > to choose EXACTLY_ONCE or AT_LEAST_ONCE. Maybe we > > >> need to > > >> > > use > > >> > > > > > > > another name > > >> > > > > > > > > > > or merge these two options. > > >> > > > > > > > > > > > > >> > > > > > > > > > > Best, > > >> > > > > > > > > > > Lijie > > >> > > > > > > > > > > > > >> > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三 > > >> > > 11:43写道: > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Hi everyone, > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Thanks all for your comments! > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > @Yanfei > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > 1. For some state backends that do not support > > >> > > > incremental > > >> > > > > > > > checkpoint, > > >> > > > > > > > > > > > > how does the > > >> > execution.checkpointing.incrementaloption > > >> > > > take > > >> > > > > > > > effect? Or > > >> > > > > > > > > > > > > is it better to put incremental under > > >> > > > > > > > state.backend.xxx.incremental? > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > I'd rather not put the option for incremental > > >> > checkpoint > > >> > > > > under > > >> > > > > > > the > > >> > > > > > > > > > > > 'state.backend', since it is more about the > > >> > checkpointing > > >> > > > > > instead > > >> > > > > > > > of > > >> > > > > > > > > > > state > > >> > > > > > > > > > > > accessing. Of course, the state backend may not > > >> > > necessarily > > >> > > > > do > > >> > > > > > > > > > > incremental > > >> > > > > > > > > > > > checkpoint as requested. If the state backend is > > not > > >> > > > capable > > >> > > > > of > > >> > > > > > > > taking > > >> > > > > > > > > > > > incremental cp, it is better to fallback to the > > full > > >> > cp. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > 2. I'm a little worried that putting all > > >> configurations > > >> > > > into > > >> > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce > > >> some > > >> > > > > > dependency > > >> > > > > > > > > > > > > problems. Some options would be used by > > >> flink-runtime > > >> > > > > module, > > >> > > > > > > but > > >> > > > > > > > > > > > > flink-runtime should not depend on > > >> > > flink-streaming-java. > > >> > > > > e.g. > > >> > > > > > > > > > > > > FLINK-28286[1]. > > >> > > > > > > > > > > > > So, I prefer to move configurations to > > >> > > > > > `CheckpointingOptions`, > > >> > > > > > > > WDYT? > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Yes, that's a very good point. Moving to > > >> > > > > > > > > > > > `CheckpointingOptions`(flink-core) makes sense. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > @Lijie > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > How about > > >> > > > > > > > > > > > > state.savepoints.dir -> > > >> > > > > execution.checkpointing.savepoint.dir > > >> > > > > > > > > > > > > state.checkpoints.dir -> > > >> > > > > > execution.checkpointing.checkpoint.dir > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Actually, I think the `checkpointing.checkpoint` > > may > > >> > > cause > > >> > > > > some > > >> > > > > > > > > > > confusion. > > >> > > > > > > > > > > > But I'm ok if others agree. > > >> > > > > > > > > > > > I'm wondering if > > >> > `execution.checkpointing.savepoint-dir` > > >> > > > > would > > >> > > > > > be > > >> > > > > > > > better. > > >> > > > > > > > > > > > WDYT? > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > 2. We changed the > > >> execution.checkpointing.local-copy' > > >> > to > > >> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'. > > >> Should > > >> > we > > >> > > > > also > > >> > > > > > > add > > >> > > > > > > > > > > > "enabled" > > >> > > > > > > > > > > > > suffix for other boolean type configuration > > >> options ? > > >> > > For > > >> > > > > > > > example, > > >> > > > > > > > > > > > > execution.checkpointing.incremental -> > > >> > > > > > > > > > > > > execution.checkpointing.incremental.enabled > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Actually, the incremental cp is something like > > >> > choosing a > > >> > > > > mode > > >> > > > > > > for > > >> > > > > > > > doing > > >> > > > > > > > > > > > checkpoint instead of enabling a function. So I > > >> think > > >> > an > > >> > > > > > > > enumeration > > >> > > > > > > > > > > option > > >> > > > > > > > > > > > `execution.checkpointing.mode` which can be 'full' > > >> > > > (default) > > >> > > > > or > > >> > > > > > > > > > > > 'incremental' would be better, WDYT? > > >> > > > > > > > > > > > And @Rui Fan @Yanfei What do you think about this? > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie Wang < > > >> > > > > > > > wangdachui9...@gmail.com> > > >> > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks for driving the discussion. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > 1. > > >> > > > > > > > > > > > > >> But I'm not so sure since there is only one > > >> > > > > > > savepoint-related > > >> > > > > > > > > > > option. > > >> > > > > > > > > > > > > Maybe someone else could share some thoughts > > here. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > How about > > >> > > > > > > > > > > > > state.savepoints.dir -> > > >> > > > > execution.checkpointing.savepoint.dir > > >> > > > > > > > > > > > > state.checkpoints.dir -> > > >> > > > > > execution.checkpointing.checkpoint.dir > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > 2. We changed the > > >> execution.checkpointing.local-copy' > > >> > > to > > >> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'. > > >> Should > > >> > we > > >> > > > > also > > >> > > > > > > add > > >> > > > > > > > > > > > "enabled" > > >> > > > > > > > > > > > > suffix for other boolean type configuration > > >> options ? > > >> > > For > > >> > > > > > > > example, > > >> > > > > > > > > > > > > execution.checkpointing.incremental -> > > >> > > > > > > > > > > > > execution.checkpointing.incremental.enabled > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > In this way, the naming style of configuration > > >> > options > > >> > > is > > >> > > > > > > > unified, and > > >> > > > > > > > > > > it > > >> > > > > > > > > > > > > can avoid potential similar problems (for > > >> example, we > > >> > > may > > >> > > > > > need > > >> > > > > > > > to add > > >> > > > > > > > > > > > more > > >> > > > > > > > > > > > > options for incremental checkpoint in the > > future). > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > Lijie > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Yanfei Lei <fredia...@gmail.com> 于2023年12月26日周二 > > >> > > 12:05写道: > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Thank you for creating the FLIP and starting > > the > > >> > > > > > discussion. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > The current arrangement of these options is > > >> indeed > > >> > > > > somewhat > > >> > > > > > > > > > > haphazard, > > >> > > > > > > > > > > > > > and the new arrangement looks much better. I > > >> have > > >> > > some > > >> > > > > > > > questions > > >> > > > > > > > > > > about > > >> > > > > > > > > > > > > > the arrangement of some new configuration > > >> options: > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 1. For some state backends that do not support > > >> > > > > incremental > > >> > > > > > > > > > > checkpoint, > > >> > > > > > > > > > > > > > how does the > > >> > > execution.checkpointing.incrementaloption > > >> > > > > take > > >> > > > > > > > effect? > > >> > > > > > > > > > > Or > > >> > > > > > > > > > > > > > is it better to put incremental under > > >> > > > > > > > state.backend.xxx.incremental? > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 2. I'm a little worried that putting all > > >> > > configurations > > >> > > > > > into > > >> > > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce > > >> some > > >> > > > > > > dependency > > >> > > > > > > > > > > > > > problems. Some options would be used by > > >> > flink-runtime > > >> > > > > > module, > > >> > > > > > > > but > > >> > > > > > > > > > > > > > flink-runtime should not depend on > > >> > > > flink-streaming-java. > > >> > > > > > e.g. > > >> > > > > > > > > > > > > > FLINK-28286[1]. > > >> > > > > > > > > > > > > > So, I prefer to move configurations to > > >> > > > > > > `CheckpointingOptions`, > > >> > > > > > > > WDYT? > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > [1] > > >> > > https://issues.apache.org/jira/browse/FLINK-28286 > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > -- > > >> > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > Yanfei > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> > > >> 于2023年12月25日周一 > > >> > > > > > 21:14写道: > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Rui Fan and Junrui, > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Thanks for the reminder! I agree to change > > the > > >> > > > > > > > > > > > > > > 'execution.checkpointing.local-copy' to > > >> > > > > > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > And for other suggestions Rui proposed: > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 1. How about > > >> execution.checkpointing.storage.type > > >> > > > > instead > > >> > > > > > > > > > > > > > > > of execution.checkpointing.storage? > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Ah, I missed something here. Actually I > > >> suggest > > >> > we > > >> > > > > could > > >> > > > > > > > merge the > > >> > > > > > > > > > > > > > current > > >> > > > > > > > > > > > > > > 'state.checkpoints.dir' and > > >> > > > 'state.checkpoint-storage' > > >> > > > > > into > > >> > > > > > > > one URI > > >> > > > > > > > > > > > > > > configuration named > > >> > 'execution.checkpointing.dir'. > > >> > > > > WDYT? > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir is > > a > > >> > > little > > >> > > > > > weird. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Yes, I think it is better to make > > 'savepoint' > > >> and > > >> > > > > > > > 'checkpoint' the > > >> > > > > > > > > > > > same > > >> > > > > > > > > > > > > > > level. But I'm not so sure since there is > > only > > >> > one > > >> > > > > > > > > > > savepoint-related > > >> > > > > > > > > > > > > > > option. Maybe someone else could share some > > >> > > thoughts > > >> > > > > > here. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > 4. How about execution.recovery.claim-mode > > >> > instead > > >> > > of > > >> > > > > > > > > > > > > > > > execution.recovery.mode? > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Agreed. That's more accurate. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Many thanks for your suggestions! > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > Zakelly > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM Junrui Lee < > > >> > > > > > > > jrlee....@gmail.com> > > >> > > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks for driving this. I agree that the > > >> > > proposed > > >> > > > > > > > restructuring > > >> > > > > > > > > > > of > > >> > > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > > configuration options is largely positive. > > >> It > > >> > > will > > >> > > > > make > > >> > > > > > > > > > > > understanding > > >> > > > > > > > > > > > > > and > > >> > > > > > > > > > > > > > > > working with Flink configurations more > > >> > intuitive. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Most of the proposed changes look great. > > >> Just a > > >> > > > > > heads-up, > > >> > > > > > > > as Rui > > >> > > > > > > > > > > > Fan > > >> > > > > > > > > > > > > > > > mentioned, Flink currently requires that > > no > > >> > > > > > > configOption's > > >> > > > > > > > key be > > >> > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > > prefix of another to avoid issues when we > > >> > > > eventually > > >> > > > > > > adopt > > >> > > > > > > > a > > >> > > > > > > > > > > > standard > > >> > > > > > > > > > > > > > YAML > > >> > > > > > > > > > > > > > > > parser, as detailed in FLINK-29372 ( > > >> > > > > > > > > > > > > > > > > > >> > > https://issues.apache.org/jira/browse/FLINK-29372 > > >> > > > ). > > >> > > > > > > > Therefore, > > >> > > > > > > > > > > > it's > > >> > > > > > > > > > > > > > better > > >> > > > > > > > > > > > > > > > to change the key > > >> > > > > 'execution.checkpointing.local-copy' > > >> > > > > > > > because it > > >> > > > > > > > > > > > > > serves as > > >> > > > > > > > > > > > > > > > a prefix to the key > > >> > > > > > > > 'execution.checkpointing.local-copy.dir'. > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Best regards, > > >> > > > > > > > > > > > > > > > Junrui > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Rui Fan <1996fan...@gmail.com> > > >> 于2023年12月25日周一 > > >> > > > > 19:11写道: > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Hi Zakelly, > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thank you for driving this proposal! > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Overall good for me. I have some > > questions > > >> > > about > > >> > > > > > these > > >> > > > > > > > names. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 1. How about > > >> > > execution.checkpointing.storage.type > > >> > > > > > > > instead of > > >> > > > > > > > > > > > > > > > > execution.checkpointing.storage? > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > It's similar to state.backend.type. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 2. How about > > >> > > > > > execution.checkpointing.local-copy.enabled > > >> > > > > > > > instead > > >> > > > > > > > > > > > of > > >> > > > > > > > > > > > > > > > > execution.checkpointing.local-copy? > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > You added a new option: > > >> > > > > > > > execution.checkpointing.local-copy.dir. > > >> > > > > > > > > > > > > > > > > IIUC, one option name shouldn't be the > > >> prefix > > >> > > of > > >> > > > > > other > > >> > > > > > > > options. > > >> > > > > > > > > > > > > > > > > If you add a new option > > >> > > > > > > > execution.checkpointing.local-copy, > > >> > > > > > > > > > > > > > > > > flink CI will fail directly. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir > > >> is a > > >> > > > > little > > >> > > > > > > > weird. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > For old options: state.savepoints.dir > > and > > >> > > > > > > > > > > state.checkpoints.dir, > > >> > > > > > > > > > > > > > > > > the savepoint and checkpoint are the > > same > > >> > > level. > > >> > > > It > > >> > > > > > > means > > >> > > > > > > > > > > > > > > > > it's a checkpoint or savepoint. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > The new option > > >> execution.checkpointing.dir is > > >> > > > fine > > >> > > > > > for > > >> > > > > > > > me. > > >> > > > > > > > > > > > > > > > > However, > > >> > execution.checkpointing.savepoint.dir > > >> > > > is a > > >> > > > > > > > little > > >> > > > > > > > > > > weird. > > >> > > > > > > > > > > > > > > > > I don't know which name is better now. > > >> Let us > > >> > > > think > > >> > > > > > > > about it > > >> > > > > > > > > > > > more. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 4. How about > > execution.recovery.claim-mode > > >> > > > instead > > >> > > > > of > > >> > > > > > > > > > > > > > > > > execution.recovery.mode? > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > The meaning of mode is too broad. The > > >> > > claim-mode > > >> > > > > may > > >> > > > > > > > > > > > > > > > > be more accurate for users. > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > WDYT? > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > > > Rui > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 5:14 PM Zakelly > > >> Lan < > > >> > > > > > > > > > > > zakelly....@gmail.com > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Hi devs, > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > I'd like to start a discussion on > > >> FLIP-406: > > >> > > > > > > Reorganize > > >> > > > > > > > State > > >> > > > > > > > > > > & > > >> > > > > > > > > > > > > > > > > > Checkpointing & Recovery > > >> Configuration[1]. > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Currently, the configuration options > > >> > > pertaining > > >> > > > > to > > >> > > > > > > > > > > > checkpointing, > > >> > > > > > > > > > > > > > > > > recovery, > > >> > > > > > > > > > > > > > > > > > and state management are primarily > > >> grouped > > >> > > > under > > >> > > > > > the > > >> > > > > > > > > > > following > > >> > > > > > > > > > > > > > > > prefixes: > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > - state.backend.* : configurations > > >> > related > > >> > > > to > > >> > > > > > > state > > >> > > > > > > > > > > > accessing > > >> > > > > > > > > > > > > > and > > >> > > > > > > > > > > > > > > > > > checkpointing, as well as specific > > >> > options > > >> > > > for > > >> > > > > > > > individual > > >> > > > > > > > > > > > > state > > >> > > > > > > > > > > > > > > > > backends > > >> > > > > > > > > > > > > > > > > > - execution.checkpointing.* : > > >> > > configurations > > >> > > > > > > > associated > > >> > > > > > > > > > > with > > >> > > > > > > > > > > > > > > > > checkpoint > > >> > > > > > > > > > > > > > > > > > execution and recovery > > >> > > > > > > > > > > > > > > > > > - execution.savepoint.*: > > >> configurations > > >> > > for > > >> > > > > > > > recovery from > > >> > > > > > > > > > > > > > savepoint > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > In addition, there are several > > >> individual > > >> > > > options > > >> > > > > > > such > > >> > > > > > > > as ' > > >> > > > > > > > > > > > > > > > > > *state.checkpoint-storage*' and > > >> > > > > > > > '*state.checkpoints.dir*' > > >> > > > > > > > > > > that > > >> > > > > > > > > > > > > fall > > >> > > > > > > > > > > > > > > > > outside > > >> > > > > > > > > > > > > > > > > > of these prefixes. The current > > >> arrangement > > >> > of > > >> > > > > these > > >> > > > > > > > options, > > >> > > > > > > > > > > > > which > > >> > > > > > > > > > > > > > span > > >> > > > > > > > > > > > > > > > > > multiple modules, is somewhat > > haphazard > > >> and > > >> > > > > lacks a > > >> > > > > > > > > > > systematic > > >> > > > > > > > > > > > > > > > structure. > > >> > > > > > > > > > > > > > > > > > For example, the options under the > > >> > > > > > > > '*CheckpointingOptions*' > > >> > > > > > > > > > > > and ' > > >> > > > > > > > > > > > > > > > > > *ExecutionCheckpointingOptions*' are > > >> > related > > >> > > > and > > >> > > > > > have > > >> > > > > > > > no > > >> > > > > > > > > > > clear > > >> > > > > > > > > > > > > > > > boundaries > > >> > > > > > > > > > > > > > > > > > from the user's perspective, but there > > >> is > > >> > no > > >> > > > > > unified > > >> > > > > > > > prefix > > >> > > > > > > > > > > for > > >> > > > > > > > > > > > > > them. > > >> > > > > > > > > > > > > > > > > With > > >> > > > > > > > > > > > > > > > > > the upcoming release of Flink 2.0, we > > >> have > > >> > an > > >> > > > > > > excellent > > >> > > > > > > > > > > > > > opportunity to > > >> > > > > > > > > > > > > > > > > > overhaul and restructure the > > >> configurations > > >> > > > > related > > >> > > > > > > to > > >> > > > > > > > > > > > > > checkpointing, > > >> > > > > > > > > > > > > > > > > > recovery, and state management. This > > >> FLIP > > >> > > > > proposes > > >> > > > > > to > > >> > > > > > > > > > > > reorganize > > >> > > > > > > > > > > > > > these > > >> > > > > > > > > > > > > > > > > > settings, making it more coherent by > > >> > module, > > >> > > > > which > > >> > > > > > > > would > > >> > > > > > > > > > > > > > significantly > > >> > > > > > > > > > > > > > > > > > lower the barriers for understanding > > and > > >> > > reduce > > >> > > > > the > > >> > > > > > > > > > > development > > >> > > > > > > > > > > > > > costs > > >> > > > > > > > > > > > > > > > > > moving forward. > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Looking forward to hearing from you! > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > [1] > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > > > > > Zakelly > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> -- > > >> Best, > > >> Hangxiang. > > >> > > > > > > > > -- > Best, > Hangxiang.