Hi everyone, It seems too late to make this in 1.19, so I suggest changing it to 1.20. Another thing I'd like to highlight is that there are some existing option classes lacking annotations, which are:
- org.apache.flink.configuration.CheckpointingOptions - org.apache.flink.configuration.StateBackendOptions - org.apache.flink.contrib.streaming.state.RocksDBOptions This FLIP will annotate these existing classes with @PublicEvolving in version 2.0, since almost all of the state-related option classes and APIs are annotated with @PublicEvolving. Moreover, the migration period of deprecated options in those classes will last for one minor release (1.20), which meets the requirement of migration period for @PublicEvolving [1]. And in 2.0, they will be removed. Based on the discussion so far, I will proceed to start the vote tomorrow. Thanks! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process Best, Zakelly On Mon, Jan 22, 2024 at 6:31 PM Zakelly Lan <zakelly....@gmail.com> wrote: > Hi everyone, > > It has been 6 days since the last call for discussion. I'd like to start a > vote after another 2 days. > > Please let me know if you have any concerns. Thanks! > > > Best, > Zakelly > > On Tue, Jan 16, 2024 at 2:54 PM Zakelly Lan <zakelly....@gmail.com> wrote: > >> Thanks for the suggestion Rui! The type is added. >> >> >> Best, >> Zakelly >> >> On Tue, Jan 16, 2024 at 2:33 PM Rui Fan <1996fan...@gmail.com> wrote: >> >>> Hi Zakelly, >>> >>> Would you mind adding the option type in the FLIP doc? >>> For example, String, Boolean or Enum, etc. Thank you. >>> >>> Best, >>> Rui >>> >>> On Tue, Jan 16, 2024 at 2:29 PM Zakelly Lan <zakelly....@gmail.com> >>> wrote: >>> >>> > Hi everyone, >>> > >>> > Thanks all for joining the discussion! I'd like to speed this up since >>> it >>> > lasts for nearly a month. I made changes on this FLIP based on >>> suggestions >>> > and compromises acceptable to most people. Please feel free to give >>> your >>> > opinion. Thanks! >>> > If there are no more suggestions, I will consider starting a vote >>> within a >>> > week. >>> > >>> > >>> > Best, >>> > Zakelly >>> > >>> > On Thu, Jan 11, 2024 at 10:31 AM Xuannan Su <suxuanna...@gmail.com> >>> wrote: >>> > >>> > > Hi Zakelly, >>> > > >>> > > I am fine with either Option 2 or Option 3. I think the naming in >>> > > Option 2 makes it clear that it is a boolean configuration. However, >>> > > most of the currently available boolean configurations do not use >>> > > "enable" as a suffix. Therefore, Option 3 looks good to me as it >>> > > follows the current practice. >>> > > >>> > > Best regards, >>> > > Xuannan >>> > > >>> > > On Thu, Jan 11, 2024 at 9:50 AM Hangxiang Yu <master...@gmail.com> >>> > wrote: >>> > > > >>> > > > > >>> > > > > That's a very good point. I realize that the word 'recovery' >>> means >>> > way >>> > > too >>> > > > > many things. So I suggest picking a more specific word here, how >>> > about >>> > > > > 'execution.state-recovery.*' ? Checkpointing and state recovery >>> are >>> > > > > corresponding terms and won't make ambiguity. >>> > > > > >>> > > > >>> > > > This makes the configuration clearer to me. We could focus on the >>> > > > `state-recovery` at first. >>> > > > >>> > > > I think we could create another FLIP for the deprecation of LEGACY >>> > mode. >>> > > > > >>> > > > >>> > > > LGTM, Let's create a new FLIP to do this. >>> > > > >>> > > > IIUC, there is no clear ownership of the local copy files from the >>> > > previous >>> > > > > job and it's better to define one. This needs more discussion so >>> we >>> > > could >>> > > > > create another thread for this. WDYT? >>> > > > > >>> > > > >>> > > > Yeah, I have created a new ticket FLINK-34032 to track and discuss >>> > this. >>> > > > >>> > > > On Wed, Jan 10, 2024 at 6:31 PM Zakelly Lan <zakelly....@gmail.com >>> > >>> > > wrote: >>> > > > >>> > > > > Hi everyone, >>> > > > > >>> > > > > It seems we still don't have a consensus on the rules for boolean >>> > type >>> > > > > options. Let me recap the alternatives we have: >>> > > > > >>> > > > > Option 1: Use enumeration options instead if possible. But this >>> may >>> > > cause >>> > > > > some name collisions or confusion as we discussed and we should >>> unify >>> > > the >>> > > > > statement everywhere. >>> > > > > Option 2: Use boolean options and add 'enabled' as the suffix. >>> > > > > Option 3: Use boolean options and ONLY add 'enabled' when there >>> are >>> > > more >>> > > > > detailed configurations under the same prefix, to prevent one >>> name >>> > from >>> > > > > serving as a prefix to another. >>> > > > > >>> > > > > I am inclined to Option 3, since it is more in line with current >>> > > practice >>> > > > > and friendly for existing users. Also It reduces the length of >>> > > > > configuration names as much as possible. >>> > > > > >>> > > > > Looking forward to your opinions! Thanks! >>> > > > > >>> > > > > >>> > > > > Best, >>> > > > > Zakelly >>> > > > > >>> > > > > On Wed, Jan 10, 2024 at 3:30 PM Zakelly Lan < >>> zakelly....@gmail.com> >>> > > wrote: >>> > > > > >>> > > > > > Hi Hangxiang, >>> > > > > > >>> > > > > > Thanks for your suggestions! >>> > > > > > >>> > > > > > 1. Could execution.recovery also contain some other behaviors >>> about >>> > > > > >> recovery ? e.g. restart-strategy. >>> > > > > > >>> > > > > > >>> > > > > > That's a very good point. I realize that the word 'recovery' >>> means >>> > > way >>> > > > > too >>> > > > > > many things. So I suggest picking a more specific word here, >>> how >>> > > about >>> > > > > > 'execution.state-recovery.*' ? Checkpointing and state >>> recovery are >>> > > > > > corresponding terms and won't make ambiguity. >>> > > > > > >>> > > > > > 2. Could we also remove some legacy configuration value ? e.g. >>> > LEGACY >>> > > > > Mode >>> > > > > >> for >>> > execution.savepoint-restore-mode/execution.recovery.claim-mode. >>> > > > > > >>> > > > > > >>> > > > > > I think we could create another FLIP for the deprecation of >>> LEGACY >>> > > mode. >>> > > > > > >>> > > > > > >>> > > > > >> 3. Could the local checkpoint be cleaned >>> > > > > >> if execution.checkpointing.local-copy.enabled is true and >>> > > > > >> execution.recovery.from-local is false ? I found it's also an >>> > issue >>> > > if >>> > > > > >> current local-recovery from enabled to disabled. Maybe another >>> > > ticket is >>> > > > > >> needed. >>> > > > > > >>> > > > > > >>> > > > > > IIUC, there is no clear ownership of the local copy files from >>> the >>> > > > > > previous job and it's better to define one. This needs more >>> > > discussion so >>> > > > > > we could create another thread for this. WDYT? >>> > > > > > >>> > > > > > >>> > > > > > Best, >>> > > > > > Zakelly >>> > > > > > >>> > > > > > On Tue, Jan 9, 2024 at 11:23 AM Hangxiang Yu < >>> master...@gmail.com> >>> > > > > wrote: >>> > > > > > >>> > > > > >> Hi, Zakelly. >>> > > > > >> Thanks for driving this. Overall LGTM as we discussed offline. >>> > > > > >> >>> > > > > >> Some comments/suggestions just came to mind: >>> > > > > >> 1. Could execution.recovery also contain some other behaviors >>> > about >>> > > > > >> recovery ? e.g. restart-strategy. >>> > > > > >> 2. Could we also remove some legacy configuration value ? e.g. >>> > > LEGACY >>> > > > > Mode >>> > > > > >> for >>> > execution.savepoint-restore-mode/execution.recovery.claim-mode. >>> > > > > >> 3. Could the local checkpoint be cleaned >>> > > > > >> if execution.checkpointing.local-copy.enabled is true and >>> > > > > >> execution.recovery.from-local is false ? I found it's also an >>> > issue >>> > > if >>> > > > > >> current local-recovery from enabled to disabled. Maybe another >>> > > ticket is >>> > > > > >> needed. >>> > > > > >> 4. +1 for enabling execution.checkpointing.incremental by >>> default >>> > > which >>> > > > > is >>> > > > > >> basically default configuration in our production environment. >>> > > > > >> >>> > > > > >> >>> > > > > >> On Mon, Jan 8, 2024 at 6:06 PM Zakelly Lan < >>> zakelly....@gmail.com >>> > > >>> > > > > wrote: >>> > > > > >> >>> > > > > >> > Hi Yun, >>> > > > > >> > >>> > > > > >> > Thanks for your comments! >>> > > > > >> > >>> > > > > >> > 1. We shall not describe the configuration with its >>> > > implementation >>> > > > > for >>> > > > > >> > > 'execution.checkpointing.local-copy.*' options, for >>> hashmap >>> > > > > >> > state-backend, >>> > > > > >> > > it would write two streams and for Rocksdb state-backend, >>> it >>> > > would >>> > > > > use >>> > > > > >> > > hard-link for backup. Thus, I think >>> > > > > >> > > 'execution.checkpointing.local-backup.*' looks better. >>> > > > > >> > >>> > > > > >> > I agreed that we'd better name the option in user's >>> perspective >>> > > > > instead >>> > > > > >> of >>> > > > > >> > the implementation, thus I name it as a copy of the >>> checkpoint >>> > in >>> > > the >>> > > > > >> > local disk, regardless of the way of generating it. The word >>> > > 'backup' >>> > > > > is >>> > > > > >> > also suitable for this case, so I agree to change to >>> > > > > >> > 'execution.checkpointing.local-backup.*' if no one objects. >>> > > > > >> > >>> > > > > >> > 2. What does the >>> > 'execution.checkpointing.data-inline-threshold' >>> > > > > >> mean? It >>> > > > > >> > > seems not so easy to understand. >>> > > > > >> > >>> > > > > >> > The 'execution.checkpointing.data-inline-threshold' >>> (original >>> > one >>> > > as >>> > > > > >> > 'state.storage.fs.memory-threshold') stands for the size >>> > threshold >>> > > > > below >>> > > > > >> > which state chunks will store inline with the metadata, >>> thus I >>> > > call it >>> > > > > >> > 'data-inline-threshold'. >>> > > > > >> > >>> > > > > >> > >>> > > > > >> > Best, >>> > > > > >> > Zakelly >>> > > > > >> > >>> > > > > >> > On Mon, Jan 8, 2024 at 10:09 AM Yun Tang <myas...@live.com> >>> > > wrote: >>> > > > > >> > >>> > > > > >> > > Hi Zakelly, >>> > > > > >> > > >>> > > > > >> > > Thanks for driving this topic. I have two concerns here: >>> > > > > >> > > >>> > > > > >> > > 1. We shall not describe the configuration with its >>> > > > > implementation >>> > > > > >> for >>> > > > > >> > > 'execution.checkpointing.local-copy.*' options, for >>> hashmap >>> > > > > >> > state-backend, >>> > > > > >> > > it would write two streams and for Rocksdb state-backend, >>> it >>> > > would >>> > > > > use >>> > > > > >> > > hard-link for backup. Thus, I think >>> > > > > >> > > 'execution.checkpointing.local-backup.*' looks better. >>> > > > > >> > > 2. What does the >>> > > 'execution.checkpointing.data-inline-threshold' >>> > > > > >> mean? >>> > > > > >> > > It seems not so easy to understand. >>> > > > > >> > > >>> > > > > >> > > Best >>> > > > > >> > > Yun Tang >>> > > > > >> > > ________________________________ >>> > > > > >> > > From: Piotr Nowojski <pnowoj...@apache.org> >>> > > > > >> > > Sent: Thursday, January 4, 2024 22:37 >>> > > > > >> > > To: dev@flink.apache.org <dev@flink.apache.org> >>> > > > > >> > > Subject: Re: [DISCUSS] FLIP-406: Reorganize State & >>> > > Checkpointing & >>> > > > > >> > > Recovery Configuration >>> > > > > >> > > >>> > > > > >> > > Hi, >>> > > > > >> > > >>> > > > > >> > > Thanks for trying to clean this up! I don't have strong >>> > > opinions on >>> > > > > >> the >>> > > > > >> > > topics discussed here, so generally speaking +1 from my >>> side! >>> > > > > >> > > >>> > > > > >> > > Best, >>> > > > > >> > > Piotrek >>> > > > > >> > > >>> > > > > >> > > śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> >>> > > napisał(a): >>> > > > > >> > > >>> > > > > >> > > > Thanks for the feedback! >>> > > > > >> > > > >>> > > > > >> > > > Using the `execution.checkpointing.incremental.enabled`, >>> > > > > >> > > > and enabling it by default sounds good to me. >>> > > > > >> > > > >>> > > > > >> > > > Best, >>> > > > > >> > > > Rui >>> > > > > >> > > > >>> > > > > >> > > > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan < >>> > > > > zakelly....@gmail.com> >>> > > > > >> > > wrote: >>> > > > > >> > > > >>> > > > > >> > > > > Hi Rui, >>> > > > > >> > > > > >>> > > > > >> > > > > Thanks for your comments! >>> > > > > >> > > > > >>> > > > > >> > > > > IMO, given that the state backend can be plugably >>> loaded >>> > > (as you >>> > > > > >> can >>> > > > > >> > > > > specify a state backend factory), I prefer not >>> providing >>> > > state >>> > > > > >> > backend >>> > > > > >> > > > > specified options in the framework. >>> > > > > >> > > > > >>> > > > > >> > > > > Secondly, the incremental checkpoint is actually a >>> sharing >>> > > file >>> > > > > >> > > strategy >>> > > > > >> > > > > across checkpoints, which means the state backend >>> *could* >>> > > reuse >>> > > > > >> files >>> > > > > >> > > > from >>> > > > > >> > > > > previous cp but not *must* do so. When the state >>> backend >>> > > could >>> > > > > not >>> > > > > >> > > reuse >>> > > > > >> > > > > the files, it is reasonable to fallback to a full >>> > > checkpoint. >>> > > > > >> > > > > >>> > > > > >> > > > > Thus, I suggest we make it >>> > > `execution.checkpointing.incremental` >>> > > > > >> and >>> > > > > >> > > > enable >>> > > > > >> > > > > it by default. For those state backends not supporting >>> > this, >>> > > > > they >>> > > > > >> > > perform >>> > > > > >> > > > > full checkpoints and print a warning to inform users. >>> > Users >>> > > do >>> > > > > not >>> > > > > >> > need >>> > > > > >> > > > to >>> > > > > >> > > > > pay special attention to different options to control >>> this >>> > > > > across >>> > > > > >> > > > different >>> > > > > >> > > > > state backends. This is more user-friendly in my >>> opinion. >>> > > WDYT? >>> > > > > >> > > > > >>> > > > > >> > > > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan < >>> > > 1996fan...@gmail.com> >>> > > > > >> > wrote: >>> > > > > >> > > > > >>> > > > > >> > > > > > Hi Zakelly, >>> > > > > >> > > > > > >>> > > > > >> > > > > > I'm not sure whether we could add the state backend >>> type >>> > > in >>> > > > > the >>> > > > > >> > > > > > new key name of state.backend.incremental. It means >>> we >>> > use >>> > > > > >> > > > > > `execution.checkpointing.rocksdb-incremental` or >>> > > > > >> > > > > > >>> `execution.checkpointing.rocksdb-incremental.enabled`. >>> > > > > >> > > > > > >>> > > > > >> > > > > > So far, state.backend.incremental only works for >>> rocksdb >>> > > state >>> > > > > >> > > backend. >>> > > > > >> > > > > > And this feature or optimization is very valuable >>> and >>> > > huge for >>> > > > > >> > large >>> > > > > >> > > > > > state flink jobs. I believe it's enabled for most >>> > > production >>> > > > > >> flink >>> > > > > >> > > jobs >>> > > > > >> > > > > > with large rocksdb state. >>> > > > > >> > > > > > >>> > > > > >> > > > > > If this option isn't generic for all state backend >>> > types, >>> > > I >>> > > > > >> guess >>> > > > > >> > we >>> > > > > >> > > > > > can enable >>> > > > > `execution.checkpointing.rocksdb-incremental.enabled` >>> > > > > >> > > > > > by default in Flink 2.0. >>> > > > > >> > > > > > >>> > > > > >> > > > > > But if it works for all state backends, it's hard to >>> > > enable it >>> > > > > >> by >>> > > > > >> > > > > default. >>> > > > > >> > > > > > Enabling great and valuable features or >>> improvements are >>> > > > > useful >>> > > > > >> > > > > > for users, especially a lot of new flink users. >>> > > Out-of-the-box >>> > > > > >> > > options >>> > > > > >> > > > > > are good for users. >>> > > > > >> > > > > > >>> > > > > >> > > > > > WDYT? >>> > > > > >> > > > > > >>> > > > > >> > > > > > Best, >>> > > > > >> > > > > > Rui >>> > > > > >> > > > > > >>> > > > > >> > > > > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan < >>> > > > > >> zakelly....@gmail.com >>> > > > > >> > > >>> > > > > >> > > > > wrote: >>> > > > > >> > > > > > >>> > > > > >> > > > > > > Hi everyone, >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > Thanks all for your comments! >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > As many of you have questions about the names for >>> > > boolean >>> > > > > >> > options, >>> > > > > >> > > I >>> > > > > >> > > > > > > suggest we make a naming rule for them. For now I >>> > could >>> > > > > think >>> > > > > >> of >>> > > > > >> > > > three >>> > > > > >> > > > > > > options: >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > Option 1: Use enumeration options if possible. But >>> > this >>> > > may >>> > > > > >> cause >>> > > > > >> > > > some >>> > > > > >> > > > > > name >>> > > > > >> > > > > > > collisions or confusion as we discussed and we >>> should >>> > > unify >>> > > > > >> the >>> > > > > >> > > > > statement >>> > > > > >> > > > > > > everywhere. >>> > > > > >> > > > > > > Option 2: Use boolean options and add 'enabled' >>> as the >>> > > > > suffix. >>> > > > > >> > > > > > > Option 3: Use boolean options and ONLY add >>> 'enabled' >>> > > when >>> > > > > >> there >>> > > > > >> > are >>> > > > > >> > > > > more >>> > > > > >> > > > > > > detailed configurations under the same prefix, to >>> > > prevent >>> > > > > one >>> > > > > >> > name >>> > > > > >> > > > from >>> > > > > >> > > > > > > serving as a prefix to another. >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > I am slightly inclined to Option 3, since it is >>> more >>> > in >>> > > line >>> > > > > >> with >>> > > > > >> > > > > current >>> > > > > >> > > > > > > practice and friendly for existing users. Also It >>> > > reduces >>> > > > > the >>> > > > > >> > > length >>> > > > > >> > > > of >>> > > > > >> > > > > > > configuration names as much as possible. I really >>> want >>> > > to >>> > > > > hear >>> > > > > >> > your >>> > > > > >> > > > > > > opinions. >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > @Xuannan >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > I agree with your comments 1 and 3. >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > For 2, If we decide to change the name, maybe >>> > > > > >> > > > > > > `execution.checkpointing.parallel-cleaner` is >>> better? >>> > > And as >>> > > > > >> for >>> > > > > >> > > > > whether >>> > > > > >> > > > > > to >>> > > > > >> > > > > > > add 'enabled' I suggest we discuss the rule above. >>> > WDYT? >>> > > > > >> > > > > > > Thanks! >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > Best, >>> > > > > >> > > > > > > Zakelly >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su < >>> > > > > >> > suxuanna...@gmail.com >>> > > > > >> > > > >>> > > > > >> > > > > > wrote: >>> > > > > >> > > > > > > >>> > > > > >> > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > Thanks for driving this! The organization of the >>> > > > > >> configuration >>> > > > > >> > > > option >>> > > > > >> > > > > > > > in the FLIP looks much cleaner and easier to >>> > > understand. >>> > > > > +1 >>> > > > > >> to >>> > > > > >> > > the >>> > > > > >> > > > > > > > FLIP. >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > Just some questions from me. >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > 1. I think the change to the ConfigOptions >>> should be >>> > > put >>> > > > > in >>> > > > > >> the >>> > > > > >> > > > > > > > `Public Interface` section, instead of `Proposed >>> > > Changed`, >>> > > > > >> as >>> > > > > >> > > those >>> > > > > >> > > > > > > > configuration options are public interface. >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > 2. The key >>> `state.checkpoint.cleaner.parallel-mode` >>> > > seems >>> > > > > >> > > > confusing. >>> > > > > >> > > > > > > > It feels like it is used to choose different >>> modes. >>> > In >>> > > > > >> fact, it >>> > > > > >> > > is >>> > > > > >> > > > a >>> > > > > >> > > > > > > > boolean flag to indicate whether to enable >>> parallel >>> > > clean. >>> > > > > >> How >>> > > > > >> > > > about >>> > > > > >> > > > > > > > making it >>> > > > > `state.checkpoint.cleaner.parallel-mode.enabled`? >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > 3. The `execution.checkpointing.write-buffer` >>> may >>> > > better >>> > > > > be >>> > > > > >> > > > > > > > `execution.checkpointing.write-buffer-size` so >>> that >>> > we >>> > > > > know >>> > > > > >> it >>> > > > > >> > is >>> > > > > >> > > > > > > > configuring the size of the buffer. >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > Best, >>> > > > > >> > > > > > > > Xuannan >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > > On Wed, Dec 27, 2023 at 7:17 PM Yanfei Lei < >>> > > > > >> > fredia...@gmail.com> >>> > > > > >> > > > > > wrote: >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > > Considering the name occupation, how about >>> > naming >>> > > it >>> > > > > as >>> > > > > >> > > > > > > > `execution.checkpointing.type`? >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > `Checkpoint Type`[1,2] is used to describe >>> > > > > >> aligned/unaligned >>> > > > > >> > > > > > > > > checkpoint, I am inclined to make a choice >>> between >>> > > > > >> > > > > > > > > `execution.checkpointing.incremental` and >>> > > > > >> > > > > > > > > `execution.checkpointing.incremental.enabled`. >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > [1] >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > >>> > > > > >> > > > >>> > > > > >> > > >>> > > > > >> > >>> > > > > >> >>> > > > > >>> > > >>> > >>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/monitoring/checkpoint_monitoring/ >>> > > > > >> > > > > > > > > [2] >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > >>> > > > > >> > > > >>> > > > > >> > > >>> > > > > >> > >>> > > > > >> >>> > > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html#L27 >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > -- >>> > > > > >> > > > > > > > > Best, >>> > > > > >> > > > > > > > > Yanfei >>> > > > > >> > > > > > > > > >>> > > > > >> > > > > > > > > Zakelly Lan <zakelly....@gmail.com> >>> > 于2023年12月27日周三 >>> > > > > >> 14:41写道: >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > Hi Lijie, >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > Thanks for the reminder! I missed this. >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > Considering the name occupation, how about >>> > naming >>> > > it >>> > > > > as >>> > > > > >> > > > > > > > > > `execution.checkpointing.type`? >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > Actually I think the current >>> > > > > >> `execution.checkpointing.mode` >>> > > > > >> > > is >>> > > > > >> > > > > > > > confusing in >>> > > > > >> > > > > > > > > > some ways, maybe >>> > > > > >> `execution.checkpointing.data-consistency` >>> > > > > >> > > is >>> > > > > >> > > > > > > better. >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > Zakelly >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > On Wed, Dec 27, 2023 at 12:59 PM Lijie Wang >>> < >>> > > > > >> > > > > > > wangdachui9...@gmail.com> >>> > > > > >> > > > > > > > > > wrote: >>> > > > > >> > > > > > > > > > >>> > > > > >> > > > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > >> I'm wondering if >>> > > > > >> > `execution.checkpointing.savepoint-dir` >>> > > > > >> > > > > would >>> > > > > >> > > > > > > be >>> > > > > >> > > > > > > > > > > better. >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > `execution.checkpointing.dir` and >>> > > > > >> > > > > > > > `execution.checkpointing.savepoint-dir` >>> > > > > >> > > > > > > > > > > are also fine for me. >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > >> So I think an enumeration option >>> > > > > >> > > > > > `execution.checkpointing.mode` >>> > > > > >> > > > > > > > which >>> > > > > >> > > > > > > > > > > can be 'full' (default) or 'incremental' >>> would >>> > > be >>> > > > > >> better >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > I agree with using an enumeration option. >>> But >>> > > > > >> currently >>> > > > > >> > > there >>> > > > > >> > > > > is >>> > > > > >> > > > > > > > already a >>> > > > > >> > > > > > > > > > > configuration option called >>> > > > > >> > `execution.checkpointing.mode`, >>> > > > > >> > > > > which >>> > > > > >> > > > > > > is >>> > > > > >> > > > > > > > used >>> > > > > >> > > > > > > > > > > to choose EXACTLY_ONCE or AT_LEAST_ONCE. >>> Maybe >>> > > we >>> > > > > >> need to >>> > > > > >> > > use >>> > > > > >> > > > > > > > another name >>> > > > > >> > > > > > > > > > > or merge these two options. >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > Lijie >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> >>> > > 于2023年12月27日周三 >>> > > > > >> > > 11:43写道: >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > Hi everyone, >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > Thanks all for your comments! >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > @Yanfei >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > 1. For some state backends that do not >>> > > support >>> > > > > >> > > > incremental >>> > > > > >> > > > > > > > checkpoint, >>> > > > > >> > > > > > > > > > > > > how does the >>> > > > > >> > execution.checkpointing.incrementaloption >>> > > > > >> > > > take >>> > > > > >> > > > > > > > effect? Or >>> > > > > >> > > > > > > > > > > > > is it better to put incremental under >>> > > > > >> > > > > > > > state.backend.xxx.incremental? >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > I'd rather not put the option for >>> > incremental >>> > > > > >> > checkpoint >>> > > > > >> > > > > under >>> > > > > >> > > > > > > the >>> > > > > >> > > > > > > > > > > > 'state.backend', since it is more about >>> the >>> > > > > >> > checkpointing >>> > > > > >> > > > > > instead >>> > > > > >> > > > > > > > of >>> > > > > >> > > > > > > > > > > state >>> > > > > >> > > > > > > > > > > > accessing. Of course, the state backend >>> may >>> > > not >>> > > > > >> > > necessarily >>> > > > > >> > > > > do >>> > > > > >> > > > > > > > > > > incremental >>> > > > > >> > > > > > > > > > > > checkpoint as requested. If the state >>> > backend >>> > > is >>> > > > > not >>> > > > > >> > > > capable >>> > > > > >> > > > > of >>> > > > > >> > > > > > > > taking >>> > > > > >> > > > > > > > > > > > incremental cp, it is better to >>> fallback to >>> > > the >>> > > > > full >>> > > > > >> > cp. >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > 2. I'm a little worried that putting all >>> > > > > >> configurations >>> > > > > >> > > > into >>> > > > > >> > > > > > > > > > > > > `ExecutionCheckpointingOptions` will >>> > > introduce >>> > > > > >> some >>> > > > > >> > > > > > dependency >>> > > > > >> > > > > > > > > > > > > problems. Some options would be used >>> by >>> > > > > >> flink-runtime >>> > > > > >> > > > > module, >>> > > > > >> > > > > > > but >>> > > > > >> > > > > > > > > > > > > flink-runtime should not depend on >>> > > > > >> > > flink-streaming-java. >>> > > > > >> > > > > e.g. >>> > > > > >> > > > > > > > > > > > > FLINK-28286[1]. >>> > > > > >> > > > > > > > > > > > > So, I prefer to move configurations to >>> > > > > >> > > > > > `CheckpointingOptions`, >>> > > > > >> > > > > > > > WDYT? >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > Yes, that's a very good point. Moving >>> to >>> > > > > >> > > > > > > > > > > > `CheckpointingOptions`(flink-core) makes >>> > > sense. >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > @Lijie >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > How about >>> > > > > >> > > > > > > > > > > > > state.savepoints.dir -> >>> > > > > >> > > > > execution.checkpointing.savepoint.dir >>> > > > > >> > > > > > > > > > > > > state.checkpoints.dir -> >>> > > > > >> > > > > > execution.checkpointing.checkpoint.dir >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > Actually, I think the >>> > > `checkpointing.checkpoint` >>> > > > > may >>> > > > > >> > > cause >>> > > > > >> > > > > some >>> > > > > >> > > > > > > > > > > confusion. >>> > > > > >> > > > > > > > > > > > But I'm ok if others agree. >>> > > > > >> > > > > > > > > > > > I'm wondering if >>> > > > > >> > `execution.checkpointing.savepoint-dir` >>> > > > > >> > > > > would >>> > > > > >> > > > > > be >>> > > > > >> > > > > > > > better. >>> > > > > >> > > > > > > > > > > > WDYT? >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > 2. We changed the >>> > > > > >> execution.checkpointing.local-copy' >>> > > > > >> > to >>> > > > > >> > > > > > > > > > > > > >>> > > 'execution.checkpointing.local-copy.enabled'. >>> > > > > >> Should >>> > > > > >> > we >>> > > > > >> > > > > also >>> > > > > >> > > > > > > add >>> > > > > >> > > > > > > > > > > > "enabled" >>> > > > > >> > > > > > > > > > > > > suffix for other boolean type >>> > configuration >>> > > > > >> options ? >>> > > > > >> > > For >>> > > > > >> > > > > > > > example, >>> > > > > >> > > > > > > > > > > > > execution.checkpointing.incremental -> >>> > > > > >> > > > > > > > > > > > > >>> > execution.checkpointing.incremental.enabled >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > Actually, the incremental cp is >>> something >>> > like >>> > > > > >> > choosing a >>> > > > > >> > > > > mode >>> > > > > >> > > > > > > for >>> > > > > >> > > > > > > > doing >>> > > > > >> > > > > > > > > > > > checkpoint instead of enabling a >>> function. >>> > So >>> > > I >>> > > > > >> think >>> > > > > >> > an >>> > > > > >> > > > > > > > enumeration >>> > > > > >> > > > > > > > > > > option >>> > > > > >> > > > > > > > > > > > `execution.checkpointing.mode` which >>> can be >>> > > 'full' >>> > > > > >> > > > (default) >>> > > > > >> > > > > or >>> > > > > >> > > > > > > > > > > > 'incremental' would be better, WDYT? >>> > > > > >> > > > > > > > > > > > And @Rui Fan @Yanfei What do you think >>> about >>> > > this? >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie >>> Wang < >>> > > > > >> > > > > > > > wangdachui9...@gmail.com> >>> > > > > >> > > > > > > > > > > > wrote: >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > Thanks for driving the discussion. >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > 1. >>> > > > > >> > > > > > > > > > > > > >> But I'm not so sure since there is >>> only >>> > > one >>> > > > > >> > > > > > > savepoint-related >>> > > > > >> > > > > > > > > > > option. >>> > > > > >> > > > > > > > > > > > > Maybe someone else could share some >>> > thoughts >>> > > > > here. >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > How about >>> > > > > >> > > > > > > > > > > > > state.savepoints.dir -> >>> > > > > >> > > > > execution.checkpointing.savepoint.dir >>> > > > > >> > > > > > > > > > > > > state.checkpoints.dir -> >>> > > > > >> > > > > > execution.checkpointing.checkpoint.dir >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > 2. We changed the >>> > > > > >> execution.checkpointing.local-copy' >>> > > > > >> > > to >>> > > > > >> > > > > > > > > > > > > >>> > > 'execution.checkpointing.local-copy.enabled'. >>> > > > > >> Should >>> > > > > >> > we >>> > > > > >> > > > > also >>> > > > > >> > > > > > > add >>> > > > > >> > > > > > > > > > > > "enabled" >>> > > > > >> > > > > > > > > > > > > suffix for other boolean type >>> > configuration >>> > > > > >> options ? >>> > > > > >> > > For >>> > > > > >> > > > > > > > example, >>> > > > > >> > > > > > > > > > > > > execution.checkpointing.incremental -> >>> > > > > >> > > > > > > > > > > > > >>> > execution.checkpointing.incremental.enabled >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > In this way, the naming style of >>> > > configuration >>> > > > > >> > options >>> > > > > >> > > is >>> > > > > >> > > > > > > > unified, and >>> > > > > >> > > > > > > > > > > it >>> > > > > >> > > > > > > > > > > > > can avoid potential similar problems >>> (for >>> > > > > >> example, we >>> > > > > >> > > may >>> > > > > >> > > > > > need >>> > > > > >> > > > > > > > to add >>> > > > > >> > > > > > > > > > > > more >>> > > > > >> > > > > > > > > > > > > options for incremental checkpoint in >>> the >>> > > > > future). >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > > > Lijie >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > Yanfei Lei <fredia...@gmail.com> >>> > > 于2023年12月26日周二 >>> > > > > >> > > 12:05写道: >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > Thank you for creating the FLIP and >>> > > starting >>> > > > > the >>> > > > > >> > > > > > discussion. >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > The current arrangement of these >>> options >>> > > is >>> > > > > >> indeed >>> > > > > >> > > > > somewhat >>> > > > > >> > > > > > > > > > > haphazard, >>> > > > > >> > > > > > > > > > > > > > and the new arrangement looks much >>> > > better. I >>> > > > > >> have >>> > > > > >> > > some >>> > > > > >> > > > > > > > questions >>> > > > > >> > > > > > > > > > > about >>> > > > > >> > > > > > > > > > > > > > the arrangement of some new >>> > configuration >>> > > > > >> options: >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > 1. For some state backends that do >>> not >>> > > support >>> > > > > >> > > > > incremental >>> > > > > >> > > > > > > > > > > checkpoint, >>> > > > > >> > > > > > > > > > > > > > how does the >>> > > > > >> > > execution.checkpointing.incrementaloption >>> > > > > >> > > > > take >>> > > > > >> > > > > > > > effect? >>> > > > > >> > > > > > > > > > > Or >>> > > > > >> > > > > > > > > > > > > > is it better to put incremental >>> under >>> > > > > >> > > > > > > > state.backend.xxx.incremental? >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > 2. I'm a little worried that >>> putting all >>> > > > > >> > > configurations >>> > > > > >> > > > > > into >>> > > > > >> > > > > > > > > > > > > > `ExecutionCheckpointingOptions` will >>> > > introduce >>> > > > > >> some >>> > > > > >> > > > > > > dependency >>> > > > > >> > > > > > > > > > > > > > problems. Some options would be >>> used by >>> > > > > >> > flink-runtime >>> > > > > >> > > > > > module, >>> > > > > >> > > > > > > > but >>> > > > > >> > > > > > > > > > > > > > flink-runtime should not depend on >>> > > > > >> > > > flink-streaming-java. >>> > > > > >> > > > > > e.g. >>> > > > > >> > > > > > > > > > > > > > FLINK-28286[1]. >>> > > > > >> > > > > > > > > > > > > > So, I prefer to move configurations >>> to >>> > > > > >> > > > > > > `CheckpointingOptions`, >>> > > > > >> > > > > > > > WDYT? >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > [1] >>> > > > > >> > > https://issues.apache.org/jira/browse/FLINK-28286 >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > -- >>> > > > > >> > > > > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > > > > Yanfei >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> >>> > > > > >> 于2023年12月25日周一 >>> > > > > >> > > > > > 21:14写道: >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Hi Rui Fan and Junrui, >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Thanks for the reminder! I agree >>> to >>> > > change >>> > > > > the >>> > > > > >> > > > > > > > > > > > > > > >>> 'execution.checkpointing.local-copy' >>> > to >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > 'execution.checkpointing.local-copy.enabled'. >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > And for other suggestions Rui >>> > proposed: >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > 1. How about >>> > > > > >> execution.checkpointing.storage.type >>> > > > > >> > > > > instead >>> > > > > >> > > > > > > > > > > > > > > > of >>> execution.checkpointing.storage? >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Ah, I missed something here. >>> Actually >>> > I >>> > > > > >> suggest >>> > > > > >> > we >>> > > > > >> > > > > could >>> > > > > >> > > > > > > > merge the >>> > > > > >> > > > > > > > > > > > > > current >>> > > > > >> > > > > > > > > > > > > > > 'state.checkpoints.dir' and >>> > > > > >> > > > 'state.checkpoint-storage' >>> > > > > >> > > > > > into >>> > > > > >> > > > > > > > one URI >>> > > > > >> > > > > > > > > > > > > > > configuration named >>> > > > > >> > 'execution.checkpointing.dir'. >>> > > > > >> > > > > WDYT? >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > 3. >>> > > execution.checkpointing.savepoint.dir is >>> > > > > a >>> > > > > >> > > little >>> > > > > >> > > > > > weird. >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Yes, I think it is better to make >>> > > > > 'savepoint' >>> > > > > >> and >>> > > > > >> > > > > > > > 'checkpoint' the >>> > > > > >> > > > > > > > > > > > same >>> > > > > >> > > > > > > > > > > > > > > level. But I'm not so sure since >>> there >>> > > is >>> > > > > only >>> > > > > >> > one >>> > > > > >> > > > > > > > > > > savepoint-related >>> > > > > >> > > > > > > > > > > > > > > option. Maybe someone else could >>> share >>> > > some >>> > > > > >> > > thoughts >>> > > > > >> > > > > > here. >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > 4. How about >>> > > execution.recovery.claim-mode >>> > > > > >> > instead >>> > > > > >> > > of >>> > > > > >> > > > > > > > > > > > > > > > execution.recovery.mode? >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Agreed. That's more accurate. >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Many thanks for your suggestions! >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > > > > > Zakelly >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM >>> Junrui >>> > > Lee < >>> > > > > >> > > > > > > > jrlee....@gmail.com> >>> > > > > >> > > > > > > > > > > > > wrote: >>> > > > > >> > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > Thanks for driving this. I agree >>> > that >>> > > the >>> > > > > >> > > proposed >>> > > > > >> > > > > > > > restructuring >>> > > > > >> > > > > > > > > > > of >>> > > > > >> > > > > > > > > > > > > the >>> > > > > >> > > > > > > > > > > > > > > > configuration options is largely >>> > > positive. >>> > > > > >> It >>> > > > > >> > > will >>> > > > > >> > > > > make >>> > > > > >> > > > > > > > > > > > understanding >>> > > > > >> > > > > > > > > > > > > > and >>> > > > > >> > > > > > > > > > > > > > > > working with Flink >>> configurations >>> > more >>> > > > > >> > intuitive. >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > Most of the proposed changes >>> look >>> > > great. >>> > > > > >> Just a >>> > > > > >> > > > > > heads-up, >>> > > > > >> > > > > > > > as Rui >>> > > > > >> > > > > > > > > > > > Fan >>> > > > > >> > > > > > > > > > > > > > > > mentioned, Flink currently >>> requires >>> > > that >>> > > > > no >>> > > > > >> > > > > > > configOption's >>> > > > > >> > > > > > > > key be >>> > > > > >> > > > > > > > > > > > the >>> > > > > >> > > > > > > > > > > > > > > > prefix of another to avoid >>> issues >>> > > when we >>> > > > > >> > > > eventually >>> > > > > >> > > > > > > adopt >>> > > > > >> > > > > > > > a >>> > > > > >> > > > > > > > > > > > standard >>> > > > > >> > > > > > > > > > > > > > YAML >>> > > > > >> > > > > > > > > > > > > > > > parser, as detailed in >>> FLINK-29372 ( >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > https://issues.apache.org/jira/browse/FLINK-29372 >>> > > > > >> > > > ). >>> > > > > >> > > > > > > > Therefore, >>> > > > > >> > > > > > > > > > > > it's >>> > > > > >> > > > > > > > > > > > > > better >>> > > > > >> > > > > > > > > > > > > > > > to change the key >>> > > > > >> > > > > 'execution.checkpointing.local-copy' >>> > > > > >> > > > > > > > because it >>> > > > > >> > > > > > > > > > > > > > serves as >>> > > > > >> > > > > > > > > > > > > > > > a prefix to the key >>> > > > > >> > > > > > > > 'execution.checkpointing.local-copy.dir'. >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > Best regards, >>> > > > > >> > > > > > > > > > > > > > > > Junrui >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > Rui Fan <1996fan...@gmail.com> >>> > > > > >> 于2023年12月25日周一 >>> > > > > >> > > > > 19:11写道: >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > Hi Zakelly, >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > Thank you for driving this >>> > proposal! >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > Overall good for me. I have >>> some >>> > > > > questions >>> > > > > >> > > about >>> > > > > >> > > > > > these >>> > > > > >> > > > > > > > names. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > 1. How about >>> > > > > >> > > execution.checkpointing.storage.type >>> > > > > >> > > > > > > > instead of >>> > > > > >> > > > > > > > > > > > > > > > > >>> execution.checkpointing.storage? >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > It's similar to >>> > state.backend.type. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > 2. How about >>> > > > > >> > > > > > execution.checkpointing.local-copy.enabled >>> > > > > >> > > > > > > > instead >>> > > > > >> > > > > > > > > > > > of >>> > > > > >> > > > > > > > > > > > > > > > > >>> > execution.checkpointing.local-copy? >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > You added a new option: >>> > > > > >> > > > > > > > execution.checkpointing.local-copy.dir. >>> > > > > >> > > > > > > > > > > > > > > > > IIUC, one option name >>> shouldn't be >>> > > the >>> > > > > >> prefix >>> > > > > >> > > of >>> > > > > >> > > > > > other >>> > > > > >> > > > > > > > options. >>> > > > > >> > > > > > > > > > > > > > > > > If you add a new option >>> > > > > >> > > > > > > > execution.checkpointing.local-copy, >>> > > > > >> > > > > > > > > > > > > > > > > flink CI will fail directly. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > 3. >>> > > execution.checkpointing.savepoint.dir >>> > > > > >> is a >>> > > > > >> > > > > little >>> > > > > >> > > > > > > > weird. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > For old options: >>> > > state.savepoints.dir >>> > > > > and >>> > > > > >> > > > > > > > > > > state.checkpoints.dir, >>> > > > > >> > > > > > > > > > > > > > > > > the savepoint and checkpoint >>> are >>> > the >>> > > > > same >>> > > > > >> > > level. >>> > > > > >> > > > It >>> > > > > >> > > > > > > means >>> > > > > >> > > > > > > > > > > > > > > > > it's a checkpoint or >>> savepoint. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > The new option >>> > > > > >> execution.checkpointing.dir is >>> > > > > >> > > > fine >>> > > > > >> > > > > > for >>> > > > > >> > > > > > > > me. >>> > > > > >> > > > > > > > > > > > > > > > > However, >>> > > > > >> > execution.checkpointing.savepoint.dir >>> > > > > >> > > > is a >>> > > > > >> > > > > > > > little >>> > > > > >> > > > > > > > > > > weird. >>> > > > > >> > > > > > > > > > > > > > > > > I don't know which name is >>> better >>> > > now. >>> > > > > >> Let us >>> > > > > >> > > > think >>> > > > > >> > > > > > > > about it >>> > > > > >> > > > > > > > > > > > more. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > 4. How about >>> > > > > execution.recovery.claim-mode >>> > > > > >> > > > instead >>> > > > > >> > > > > of >>> > > > > >> > > > > > > > > > > > > > > > > execution.recovery.mode? >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > The meaning of mode is too >>> broad. >>> > > The >>> > > > > >> > > claim-mode >>> > > > > >> > > > > may >>> > > > > >> > > > > > > > > > > > > > > > > be more accurate for users. >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > WDYT? >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > > > > > > > Rui >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at >>> 5:14 PM >>> > > Zakelly >>> > > > > >> Lan < >>> > > > > >> > > > > > > > > > > > zakelly....@gmail.com >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > wrote: >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > Hi devs, >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > I'd like to start a >>> discussion >>> > on >>> > > > > >> FLIP-406: >>> > > > > >> > > > > > > Reorganize >>> > > > > >> > > > > > > > State >>> > > > > >> > > > > > > > > > > & >>> > > > > >> > > > > > > > > > > > > > > > > > Checkpointing & Recovery >>> > > > > >> Configuration[1]. >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > Currently, the configuration >>> > > options >>> > > > > >> > > pertaining >>> > > > > >> > > > > to >>> > > > > >> > > > > > > > > > > > checkpointing, >>> > > > > >> > > > > > > > > > > > > > > > > recovery, >>> > > > > >> > > > > > > > > > > > > > > > > > and state management are >>> > primarily >>> > > > > >> grouped >>> > > > > >> > > > under >>> > > > > >> > > > > > the >>> > > > > >> > > > > > > > > > > following >>> > > > > >> > > > > > > > > > > > > > > > prefixes: >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > - state.backend.* : >>> > > configurations >>> > > > > >> > related >>> > > > > >> > > > to >>> > > > > >> > > > > > > state >>> > > > > >> > > > > > > > > > > > accessing >>> > > > > >> > > > > > > > > > > > > > and >>> > > > > >> > > > > > > > > > > > > > > > > > checkpointing, as well as >>> > > specific >>> > > > > >> > options >>> > > > > >> > > > for >>> > > > > >> > > > > > > > individual >>> > > > > >> > > > > > > > > > > > > state >>> > > > > >> > > > > > > > > > > > > > > > > backends >>> > > > > >> > > > > > > > > > > > > > > > > > - >>> execution.checkpointing.* : >>> > > > > >> > > configurations >>> > > > > >> > > > > > > > associated >>> > > > > >> > > > > > > > > > > with >>> > > > > >> > > > > > > > > > > > > > > > > checkpoint >>> > > > > >> > > > > > > > > > > > > > > > > > execution and recovery >>> > > > > >> > > > > > > > > > > > > > > > > > - execution.savepoint.*: >>> > > > > >> configurations >>> > > > > >> > > for >>> > > > > >> > > > > > > > recovery from >>> > > > > >> > > > > > > > > > > > > > savepoint >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > In addition, there are >>> several >>> > > > > >> individual >>> > > > > >> > > > options >>> > > > > >> > > > > > > such >>> > > > > >> > > > > > > > as ' >>> > > > > >> > > > > > > > > > > > > > > > > > *state.checkpoint-storage*' >>> and >>> > > > > >> > > > > > > > '*state.checkpoints.dir*' >>> > > > > >> > > > > > > > > > > that >>> > > > > >> > > > > > > > > > > > > fall >>> > > > > >> > > > > > > > > > > > > > > > > outside >>> > > > > >> > > > > > > > > > > > > > > > > > of these prefixes. The >>> current >>> > > > > >> arrangement >>> > > > > >> > of >>> > > > > >> > > > > these >>> > > > > >> > > > > > > > options, >>> > > > > >> > > > > > > > > > > > > which >>> > > > > >> > > > > > > > > > > > > > span >>> > > > > >> > > > > > > > > > > > > > > > > > multiple modules, is >>> somewhat >>> > > > > haphazard >>> > > > > >> and >>> > > > > >> > > > > lacks a >>> > > > > >> > > > > > > > > > > systematic >>> > > > > >> > > > > > > > > > > > > > > > structure. >>> > > > > >> > > > > > > > > > > > > > > > > > For example, the options >>> under >>> > the >>> > > > > >> > > > > > > > '*CheckpointingOptions*' >>> > > > > >> > > > > > > > > > > > and ' >>> > > > > >> > > > > > > > > > > > > > > > > > >>> *ExecutionCheckpointingOptions*' >>> > > are >>> > > > > >> > related >>> > > > > >> > > > and >>> > > > > >> > > > > > have >>> > > > > >> > > > > > > > no >>> > > > > >> > > > > > > > > > > clear >>> > > > > >> > > > > > > > > > > > > > > > boundaries >>> > > > > >> > > > > > > > > > > > > > > > > > from the user's >>> perspective, but >>> > > there >>> > > > > >> is >>> > > > > >> > no >>> > > > > >> > > > > > unified >>> > > > > >> > > > > > > > prefix >>> > > > > >> > > > > > > > > > > for >>> > > > > >> > > > > > > > > > > > > > them. >>> > > > > >> > > > > > > > > > > > > > > > > With >>> > > > > >> > > > > > > > > > > > > > > > > > the upcoming release of >>> Flink >>> > > 2.0, we >>> > > > > >> have >>> > > > > >> > an >>> > > > > >> > > > > > > excellent >>> > > > > >> > > > > > > > > > > > > > opportunity to >>> > > > > >> > > > > > > > > > > > > > > > > > overhaul and restructure the >>> > > > > >> configurations >>> > > > > >> > > > > related >>> > > > > >> > > > > > > to >>> > > > > >> > > > > > > > > > > > > > checkpointing, >>> > > > > >> > > > > > > > > > > > > > > > > > recovery, and state >>> management. >>> > > This >>> > > > > >> FLIP >>> > > > > >> > > > > proposes >>> > > > > >> > > > > > to >>> > > > > >> > > > > > > > > > > > reorganize >>> > > > > >> > > > > > > > > > > > > > these >>> > > > > >> > > > > > > > > > > > > > > > > > settings, making it more >>> > coherent >>> > > by >>> > > > > >> > module, >>> > > > > >> > > > > which >>> > > > > >> > > > > > > > would >>> > > > > >> > > > > > > > > > > > > > significantly >>> > > > > >> > > > > > > > > > > > > > > > > > lower the barriers for >>> > > understanding >>> > > > > and >>> > > > > >> > > reduce >>> > > > > >> > > > > the >>> > > > > >> > > > > > > > > > > development >>> > > > > >> > > > > > > > > > > > > > costs >>> > > > > >> > > > > > > > > > > > > > > > > > moving forward. >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > Looking forward to hearing >>> from >>> > > you! >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > [1] >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > >>> > > > > >> > > > >>> > > > > >> > > >>> > > > > >> > >>> > > > > >> >>> > > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > Best, >>> > > > > >> > > > > > > > > > > > > > > > > > Zakelly >>> > > > > >> > > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > > >>> > > > > >> > > > > > > > > > > >>> > > > > >> > > > > > > > >>> > > > > >> > > > > > > >>> > > > > >> > > > > > >>> > > > > >> > > > > >>> > > > > >> > > > >>> > > > > >> > > >>> > > > > >> > >>> > > > > >> >>> > > > > >> >>> > > > > >> -- >>> > > > > >> Best, >>> > > > > >> Hangxiang. >>> > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > -- >>> > > > Best, >>> > > > Hangxiang. >>> > > >>> > > >>> > >>> >>