Hi, We are running Flink 1.20.1, and see a strange issue when trying to read a savepoint from minio/S3 to a hashmap backend. At first we'd think the file is not there, but when checking the S3 bucket the file is there. This is not systematic and only happens from time to time. We think it's an environmental issue. we were wondering if there were any options available to maybe give it a retry ? This is the exception we see:
org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:174) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:108) at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:119) at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:61) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$3(StreamTaskStateInitializerImpl.java:446) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:173) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:457) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:203) at org.apache.flink.state.api.input.StreamOperatorContextBuilder.build(StreamOperatorContextBuilder.java:129) at org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:176) at org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:66) at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:92) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:113) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:71) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:338) Caused by: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: null), S3 Extended Request ID: e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328 (Path: s3://aiops-ir-lifecycle/savepoints/savepoint-7a276c-8ba7a1a7741b/2bef5371-e008-4e36-a0fe-c7e6fe11c844) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1114) at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1099) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1084) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:1077) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$1(PrestoS3FileSystem.java:1021) at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:1020) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263) at java.base/java.io.FilterInputStream.read(FilterInputStream.java:82) at org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:88) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381) at org.apache.flink.core.io.VersionedIOReadableWritable.read(VersionedIOReadableWritable.java:47) at org.apache.flink.runtime.state.KeyedBackendSerializationProxy.read(KeyedBackendSerializationProxy.java:143) at org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.readMetaData(FullSnapshotRestoreOperation.java:194) at org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.restoreKeyGroupsInStateHandle(FullSnapshotRestoreOperation.java:171) at org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.access$100(FullSnapshotRestoreOperation.java:113) at org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:158) at org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:140) at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:116) at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:58) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:171) ... 15 more Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: null), S3 Extended Request ID: e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1912) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1450) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1419) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1183) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:838) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:805) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:779) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:735) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:581) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5593) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5540) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1574) at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1102) Thanks JM