[ https://issues.apache.org/jira/browse/FLINK-26079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490751#comment-17490751 ]
Roman Khachatryan commented on FLINK-26079: ------------------------------------------- [~dwysakowicz] {quote}Do I understand it correctly, that the use case that breaks is basically changing the state backend from a non-changelog to a changelog state backend? {quote} Yes. Recovering from a non-changelog checkpoint (not savepoint) is desirable. The motivation is to reduce downtime. [~pnowojski] {quote}DataSourceTask is a legacy DataSet API class. We can safely limit ourselves just to StreamTask. StreamTask#createStateBackend or StateBackendLoader#fromApplicationOrConfigOrDefault could be that one place. {quote} You're right regarding the DataSourceTask, I mistook it for FLIP-27 task. However, state backend is also created by StreamOperatorContextBuilder (called by operators). Shouldn't the check be there as well? {quote}I don't like that we would have to pass the restore mode to implement such temporary check, but I don't know what's the alternative? {quote} No, me neither. I'm not sure we should implement the validation. I see the following alternatives: 1. Fix the original issue 2. Only document the limitation without enforcing it 3. Disallow recovery from non-changelog checkpoints (only allow savepoints as Dawid mentioned) As for fixing the original issue (cc: [~yunta]): 1. Register all state with the SharedStateRegistry. This would require changing registerSharedStates() of at least KeyGroupsStateHandle and IncrementalRemoteKeyedStateHandle 2. Limit the above to only initial checkpoint and only recovery (CompletedCheckpoint.registerSharedStatesAfterRestored) 3. Wrap the materialized state with Changelog handles on JM, during recovery (not an option IMO because JM shouldn't be aware of that) > Disallow combination of Changelog backend with CLAIM restore mode when > recovering from non-changelog checkpoint > --------------------------------------------------------------------------------------------------------------- > > Key: FLINK-26079 > URL: https://issues.apache.org/jira/browse/FLINK-26079 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / State Backends > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Blocker > Fix For: 1.15.0 > > > Extracted from FLINK-25872. -- This message was sent by Atlassian Jira (v8.20.1#820001)