[ 
https://issues.apache.org/jira/browse/FLINK-37693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hangxiang Yu resolved FLINK-37693.
----------------------------------
    Resolution: Fixed

Merged into master via d82cc3f

> ForSt fails to restore from reused checkpoint
> ---------------------------------------------
>
>                 Key: FLINK-37693
>                 URL: https://issues.apache.org/jira/browse/FLINK-37693
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 2.0.0
>            Reporter: Hangxiang Yu
>            Assignee: Hangxiang Yu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2025-04-17-11-42-46-972.png
>
>
> When forst starts to restore from reused checkpoint, the mapping entries 
> include:
>  # remote uuid -> state handle (also remote uuid path in the state handle)
> 2. remote sst -> remote uuid
> {code:java}
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.datatransfer.DataTransferStrategy [] - Reuse 
> file from checkpoint: File State: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d
>  [67294473 bytes], 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - Add entry 
> to mapping table: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95
>  -> MappingEntry{source=HandleBackedSource{stateHandle=File State: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95
>  [67294450 bytes]}, fileOwnership=NOT_OWNED, isDirectory= false}
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - link: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000769.sst
>  -> 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - decide 
> restored file ownership based on dbFilePath: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.datatransfer.DataTransferStrategy [] - Reuse 
> file from checkpoint: File State: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/33569f77-0ee1-414b-a860-599c932d788c
>  [26437762 bytes], 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000517.sst
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - Add entry 
> to mapping table: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d
>  -> MappingEntry{source=HandleBackedSource{stateHandle=File State: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d
>  [67294473 bytes]}, fileOwnership=NOT_OWNED, isDirectory= false}
> 2025-04-16 19:53:55,131 INFO  
> org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - link: 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst
>  -> 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d{code}
> When restored, the file is existed, but fails because it always try to find a 
> sst in the local filesystem:
> {code:java}
> 2025-04-16 19:53:55,216 ERROR 
> org.apache.flink.state.forst.sync.ForStSyncKeyedStateBackendBuilder [] - 
> Caught unexpected exception.
> java.io.FileNotFoundException: File 
> hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d
>  does not exist or the user running Flink ('flink') has insufficient 
> permissions to access it.
>     at 
> org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.getFileStatus(SafetyNetWrapperFileSystem.java:65)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.fs.ForStFlinkFileSystem.getFileStatus(ForStFlinkFileSystem.java:311)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.fs.ForStFlinkFileSystem.listStatus(ForStFlinkFileSystem.java:358)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.fs.StringifiedForStFileSystem.listStatus(StringifiedForStFileSystem.java:52)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at org.forstdb.RocksDB.open(Native Method) 
> ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at org.forstdb.RocksDB.open(RocksDB.java:318) 
> ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.ForStOperationUtils.openDB(ForStOperationUtils.java:85)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStHandle.loadDb(ForStHandle.java:115) 
> ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStHandle.openDB(ForStHandle.java:103) 
> ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.restoreBaseDBFromMainHandle(ForStIncrementalRestoreOperation.java:363)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.initBaseDBFromSingleStateHandle(ForStIncrementalRestoreOperation.java:285)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.innerRestore(ForStIncrementalRestoreOperation.java:262)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.lambda$restore$1(ForStIncrementalRestoreOperation.java:222)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.runAndReportDuration(ForStIncrementalRestoreOperation.java:419)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
>     at 
> org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.restore(ForStIncrementalRestoreOperation.java:222)
>  ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] {code}
>  
> It should not be a NOT_OWNED for local access.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to