[jira] [Commented] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1

Roman Khachatryan (Jira) Wed, 09 Nov 2022 11:05:29 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-29913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631234#comment-17631234
 ]


Roman Khachatryan commented on FLINK-29913:
-------------------------------------------

Thanks a lot for noticing and reporting this issue [~Yanfei Lei] and [~klion26]!

I think there is a (conceptually) simple solution: always generate unique state 
handle IDs.
(It was already discussed offline before as it could solve some similar 
problems like detecting duplicates in SharedStateRegistry).

The ID doesn't have to have any semantic meaning. 
RocksDB happens to [store the local path in 
ID|https://github.com/apache/flink/blob/421f057a7488fd64854a82424755f76b89561a0b/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java#L397]
 and then uses it [on 
recovery|https://github.com/apache/flink/blob/421f057a7488fd64854a82424755f76b89561a0b/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBStateDownloader.java#L105];
 but it can be a separate field in the handle as well.

I don't see any issues with this approach, including recovery, rescaling, and 
JM failover cases.

What do you think? Maybe there are some alternatives?

> Shared state would be discarded by mistake when maxConcurrentCheckpoint>1
> -------------------------------------------------------------------------
>
>                 Key: FLINK-29913
>                 URL: https://issues.apache.org/jira/browse/FLINK-29913
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0, 1.16.0
>            Reporter: Yanfei Lei
>            Priority: Minor
>
> When maxConcurrentCheckpoint>1, the shared state of Incremental rocksdb state 
> backend would be discarded by registering the same name handle. See 
> [https://github.com/apache/flink/pull/21050#discussion_r1011061072]
> cc [~roman] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29913) Shared state would be discarded by mistake when maxConcurrentCheckpoint>1

Reply via email to