[ 
https://issues.apache.org/jira/browse/FLINK-26079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490751#comment-17490751
 ] 

Roman Khachatryan commented on FLINK-26079:
-------------------------------------------

[~dwysakowicz]
{quote}Do I understand it correctly, that the use case that breaks is basically 
changing the state backend from a non-changelog to a changelog state backend?
{quote}
Yes. Recovering from a non-changelog checkpoint (not savepoint) is desirable. 
The motivation is to reduce downtime.

[~pnowojski]
{quote}DataSourceTask is a legacy DataSet API class. We can safely limit 
ourselves just to StreamTask.
StreamTask#createStateBackend or 
StateBackendLoader#fromApplicationOrConfigOrDefault could be that one place. 
{quote}
You're right regarding the DataSourceTask, I mistook it for FLIP-27 task.
However, state backend is also created by StreamOperatorContextBuilder (called 
by operators). Shouldn't the check be there as well?
{quote}I don't like that we would have to pass the restore mode to implement 
such temporary check, but I don't know what's the alternative?
{quote}
No, me neither. I'm not sure we should implement the validation.

I see the following alternatives:
1. Fix the original issue
2. Only document the limitation without enforcing it
3. Disallow recovery from non-changelog checkpoints (only allow savepoints as 
Dawid mentioned)
 
As for fixing the original issue (cc: [~yunta]):
1. Register all state with the SharedStateRegistry. This would require changing 
registerSharedStates() of at least KeyGroupsStateHandle and 
IncrementalRemoteKeyedStateHandle
2. Limit the above to only initial checkpoint and only recovery 
(CompletedCheckpoint.registerSharedStatesAfterRestored)
3. Wrap the materialized state with Changelog handles on JM, during recovery 
(not an option IMO because JM shouldn't be aware of that)
 

> Disallow combination of Changelog backend with CLAIM restore mode when 
> recovering from non-changelog checkpoint
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26079
>                 URL: https://issues.apache.org/jira/browse/FLINK-26079
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration, Runtime / State Backends
>            Reporter: Roman Khachatryan
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>             Fix For: 1.15.0
>
>
> Extracted from FLINK-25872.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to