Hi Zakelly, >> I'm wondering if `execution.checkpointing.savepoint-dir` would be better.
`execution.checkpointing.dir` and `execution.checkpointing.savepoint-dir` are also fine for me. >> So I think an enumeration option `execution.checkpointing.mode` which can be 'full' (default) or 'incremental' would be better I agree with using an enumeration option. But currently there is already a configuration option called `execution.checkpointing.mode`, which is used to choose EXACTLY_ONCE or AT_LEAST_ONCE. Maybe we need to use another name or merge these two options. Best, Lijie Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三 11:43写道: > Hi everyone, > > Thanks all for your comments! > > @Yanfei > > > 1. For some state backends that do not support incremental checkpoint, > > how does the execution.checkpointing.incrementaloption take effect? Or > > is it better to put incremental under state.backend.xxx.incremental? > > > I'd rather not put the option for incremental checkpoint under the > 'state.backend', since it is more about the checkpointing instead of state > accessing. Of course, the state backend may not necessarily do incremental > checkpoint as requested. If the state backend is not capable of taking > incremental cp, it is better to fallback to the full cp. > > 2. I'm a little worried that putting all configurations into > > `ExecutionCheckpointingOptions` will introduce some dependency > > problems. Some options would be used by flink-runtime module, but > > flink-runtime should not depend on flink-streaming-java. e.g. > > FLINK-28286[1]. > > So, I prefer to move configurations to `CheckpointingOptions`, WDYT? > > > > Yes, that's a very good point. Moving to > `CheckpointingOptions`(flink-core) makes sense. > > @Lijie > > How about > > state.savepoints.dir -> execution.checkpointing.savepoint.dir > > state.checkpoints.dir -> execution.checkpointing.checkpoint.dir > > > Actually, I think the `checkpointing.checkpoint` may cause some confusion. > But I'm ok if others agree. > I'm wondering if `execution.checkpointing.savepoint-dir` would be better. > WDYT? > > 2. We changed the execution.checkpointing.local-copy' to > > 'execution.checkpointing.local-copy.enabled'. Should we also add > "enabled" > > suffix for other boolean type configuration options ? For example, > > execution.checkpointing.incremental -> > > execution.checkpointing.incremental.enabled > > > > Actually, the incremental cp is something like choosing a mode for doing > checkpoint instead of enabling a function. So I think an enumeration option > `execution.checkpointing.mode` which can be 'full' (default) or > 'incremental' would be better, WDYT? > And @Rui Fan @Yanfei What do you think about this? > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie Wang <wangdachui9...@gmail.com> > wrote: > > > Hi Zakelly, > > > > Thanks for driving the discussion. > > > > 1. > > >> But I'm not so sure since there is only one savepoint-related option. > > Maybe someone else could share some thoughts here. > > > > How about > > state.savepoints.dir -> execution.checkpointing.savepoint.dir > > state.checkpoints.dir -> execution.checkpointing.checkpoint.dir > > > > 2. We changed the execution.checkpointing.local-copy' to > > 'execution.checkpointing.local-copy.enabled'. Should we also add > "enabled" > > suffix for other boolean type configuration options ? For example, > > execution.checkpointing.incremental -> > > execution.checkpointing.incremental.enabled > > > > In this way, the naming style of configuration options is unified, and it > > can avoid potential similar problems (for example, we may need to add > more > > options for incremental checkpoint in the future). > > > > Best, > > Lijie > > > > Yanfei Lei <fredia...@gmail.com> 于2023年12月26日周二 12:05写道: > > > > > Hi Zakelly, > > > > > > Thank you for creating the FLIP and starting the discussion. > > > > > > The current arrangement of these options is indeed somewhat haphazard, > > > and the new arrangement looks much better. I have some questions about > > > the arrangement of some new configuration options: > > > > > > 1. For some state backends that do not support incremental checkpoint, > > > how does the execution.checkpointing.incrementaloption take effect? Or > > > is it better to put incremental under state.backend.xxx.incremental? > > > > > > 2. I'm a little worried that putting all configurations into > > > `ExecutionCheckpointingOptions` will introduce some dependency > > > problems. Some options would be used by flink-runtime module, but > > > flink-runtime should not depend on flink-streaming-java. e.g. > > > FLINK-28286[1]. > > > So, I prefer to move configurations to `CheckpointingOptions`, WDYT? > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-28286 > > > > > > -- > > > Best, > > > Yanfei > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月25日周一 21:14写道: > > > > > > > > Hi Rui Fan and Junrui, > > > > > > > > Thanks for the reminder! I agree to change the > > > > 'execution.checkpointing.local-copy' to > > > > 'execution.checkpointing.local-copy.enabled'. > > > > > > > > And for other suggestions Rui proposed: > > > > > > > > 1. How about execution.checkpointing.storage.type instead > > > > > of execution.checkpointing.storage? > > > > > > > > > > > > Ah, I missed something here. Actually I suggest we could merge the > > > current > > > > 'state.checkpoints.dir' and 'state.checkpoint-storage' into one URI > > > > configuration named 'execution.checkpointing.dir'. WDYT? > > > > > > > > 3. execution.checkpointing.savepoint.dir is a little weird. > > > > > > > > > > > > > Yes, I think it is better to make 'savepoint' and 'checkpoint' the > same > > > > level. But I'm not so sure since there is only one savepoint-related > > > > option. Maybe someone else could share some thoughts here. > > > > > > > > 4. How about execution.recovery.claim-mode instead of > > > > > execution.recovery.mode? > > > > > > > > > > > > > Agreed. That's more accurate. > > > > > > > > > > > > Many thanks for your suggestions! > > > > > > > > Best, > > > > Zakelly > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM Junrui Lee <jrlee....@gmail.com> > > wrote: > > > > > > > > > Hi Zakelly, > > > > > > > > > > Thanks for driving this. I agree that the proposed restructuring of > > the > > > > > configuration options is largely positive. It will make > understanding > > > and > > > > > working with Flink configurations more intuitive. > > > > > > > > > > Most of the proposed changes look great. Just a heads-up, as Rui > Fan > > > > > mentioned, Flink currently requires that no configOption's key be > the > > > > > prefix of another to avoid issues when we eventually adopt a > standard > > > YAML > > > > > parser, as detailed in FLINK-29372 ( > > > > > https://issues.apache.org/jira/browse/FLINK-29372). Therefore, > it's > > > better > > > > > to change the key 'execution.checkpointing.local-copy' because it > > > serves as > > > > > a prefix to the key 'execution.checkpointing.local-copy.dir'. > > > > > > > > > > Best regards, > > > > > Junrui > > > > > > > > > > Rui Fan <1996fan...@gmail.com> 于2023年12月25日周一 19:11写道: > > > > > > > > > > > Hi Zakelly, > > > > > > > > > > > > Thank you for driving this proposal! > > > > > > > > > > > > Overall good for me. I have some questions about these names. > > > > > > > > > > > > 1. How about execution.checkpointing.storage.type instead of > > > > > > execution.checkpointing.storage? > > > > > > > > > > > > It's similar to state.backend.type. > > > > > > > > > > > > 2. How about execution.checkpointing.local-copy.enabled instead > of > > > > > > execution.checkpointing.local-copy? > > > > > > > > > > > > You added a new option: execution.checkpointing.local-copy.dir. > > > > > > IIUC, one option name shouldn't be the prefix of other options. > > > > > > If you add a new option execution.checkpointing.local-copy, > > > > > > flink CI will fail directly. > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir is a little weird. > > > > > > > > > > > > For old options: state.savepoints.dir and state.checkpoints.dir, > > > > > > the savepoint and checkpoint are the same level. It means > > > > > > it's a checkpoint or savepoint. > > > > > > > > > > > > The new option execution.checkpointing.dir is fine for me. > > > > > > However, execution.checkpointing.savepoint.dir is a little weird. > > > > > > I don't know which name is better now. Let us think about it > more. > > > > > > > > > > > > 4. How about execution.recovery.claim-mode instead of > > > > > > execution.recovery.mode? > > > > > > > > > > > > The meaning of mode is too broad. The claim-mode may > > > > > > be more accurate for users. > > > > > > > > > > > > WDYT? > > > > > > > > > > > > Best, > > > > > > Rui > > > > > > > > > > > > On Mon, Dec 25, 2023 at 5:14 PM Zakelly Lan < > zakelly....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > I'd like to start a discussion on FLIP-406: Reorganize State & > > > > > > > Checkpointing & Recovery Configuration[1]. > > > > > > > > > > > > > > Currently, the configuration options pertaining to > checkpointing, > > > > > > recovery, > > > > > > > and state management are primarily grouped under the following > > > > > prefixes: > > > > > > > > > > > > > > - state.backend.* : configurations related to state > accessing > > > and > > > > > > > checkpointing, as well as specific options for individual > > state > > > > > > backends > > > > > > > - execution.checkpointing.* : configurations associated with > > > > > > checkpoint > > > > > > > execution and recovery > > > > > > > - execution.savepoint.*: configurations for recovery from > > > savepoint > > > > > > > > > > > > > > In addition, there are several individual options such as ' > > > > > > > *state.checkpoint-storage*' and '*state.checkpoints.dir*' that > > fall > > > > > > outside > > > > > > > of these prefixes. The current arrangement of these options, > > which > > > span > > > > > > > multiple modules, is somewhat haphazard and lacks a systematic > > > > > structure. > > > > > > > For example, the options under the '*CheckpointingOptions*' > and ' > > > > > > > *ExecutionCheckpointingOptions*' are related and have no clear > > > > > boundaries > > > > > > > from the user's perspective, but there is no unified prefix for > > > them. > > > > > > With > > > > > > > the upcoming release of Flink 2.0, we have an excellent > > > opportunity to > > > > > > > overhaul and restructure the configurations related to > > > checkpointing, > > > > > > > recovery, and state management. This FLIP proposes to > reorganize > > > these > > > > > > > settings, making it more coherent by module, which would > > > significantly > > > > > > > lower the barriers for understanding and reduce the development > > > costs > > > > > > > moving forward. > > > > > > > > > > > > > > Looking forward to hearing from you! > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560 > > > > > > > > > > > > > > Best, > > > > > > > Zakelly > > > > > > > > > > > > > > > > > > > > > > > >