[ https://issues.apache.org/jira/browse/FLINK-37693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hangxiang Yu resolved FLINK-37693. ---------------------------------- Resolution: Fixed Merged into master via d82cc3f > ForSt fails to restore from reused checkpoint > --------------------------------------------- > > Key: FLINK-37693 > URL: https://issues.apache.org/jira/browse/FLINK-37693 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends > Affects Versions: 2.0.0 > Reporter: Hangxiang Yu > Assignee: Hangxiang Yu > Priority: Major > Labels: pull-request-available > Attachments: image-2025-04-17-11-42-46-972.png > > > When forst starts to restore from reused checkpoint, the mapping entries > include: > # remote uuid -> state handle (also remote uuid path in the state handle) > 2. remote sst -> remote uuid > {code:java} > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.datatransfer.DataTransferStrategy [] - Reuse > file from checkpoint: File State: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d > [67294473 bytes], > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - Add entry > to mapping table: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95 > -> MappingEntry{source=HandleBackedSource{stateHandle=File State: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95 > [67294450 bytes]}, fileOwnership=NOT_OWNED, isDirectory= false} > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - link: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000769.sst > -> > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/86aae1d6-99ba-48f1-8afc-944c57742f95 > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - decide > restored file ownership based on dbFilePath: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.datatransfer.DataTransferStrategy [] - Reuse > file from checkpoint: File State: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/33569f77-0ee1-414b-a860-599c932d788c > [26437762 bytes], > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000517.sst > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - Add entry > to mapping table: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d > -> MappingEntry{source=HandleBackedSource{stateHandle=File State: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d > [67294473 bytes]}, fileOwnership=NOT_OWNED, isDirectory= false} > 2025-04-16 19:53:55,131 INFO > org.apache.flink.state.forst.fs.filemapping.FileMappingManager [] - link: > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/000514.sst > -> > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d{code} > When restored, the file is existed, but fails because it always try to find a > sst in the local filesystem: > {code:java} > 2025-04-16 19:53:55,216 ERROR > org.apache.flink.state.forst.sync.ForStSyncKeyedStateBackendBuilder [] - > Caught unexpected exception. > java.io.FileNotFoundException: File > hdfs://k8s-flink-test/checkpoint/39779093/shared/op_KeyedProcessOperator_90bea66de1c231edf33913ecd54406c1__1_1__attempt_0/db/1b89404f-478c-4534-a7c4-ba09e78f175d > does not exist or the user running Flink ('flink') has insufficient > permissions to access it. > at > org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:106) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.core.fs.SafetyNetWrapperFileSystem.getFileStatus(SafetyNetWrapperFileSystem.java:65) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.fs.ForStFlinkFileSystem.getFileStatus(ForStFlinkFileSystem.java:311) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.fs.ForStFlinkFileSystem.listStatus(ForStFlinkFileSystem.java:358) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.fs.StringifiedForStFileSystem.listStatus(StringifiedForStFileSystem.java:52) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.forstdb.RocksDB.open(Native Method) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.forstdb.RocksDB.open(RocksDB.java:318) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.ForStOperationUtils.openDB(ForStOperationUtils.java:85) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStHandle.loadDb(ForStHandle.java:115) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStHandle.openDB(ForStHandle.java:103) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.restoreBaseDBFromMainHandle(ForStIncrementalRestoreOperation.java:363) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.initBaseDBFromSingleStateHandle(ForStIncrementalRestoreOperation.java:285) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.innerRestore(ForStIncrementalRestoreOperation.java:262) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.lambda$restore$1(ForStIncrementalRestoreOperation.java:222) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.runAndReportDuration(ForStIncrementalRestoreOperation.java:419) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.flink.state.forst.restore.ForStIncrementalRestoreOperation.restore(ForStIncrementalRestoreOperation.java:222) > ~[flink-dist_2.12-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] {code} > > It should not be a NOT_OWNED for local access. -- This message was sent by Atlassian Jira (v8.20.10#820010)