Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error.
We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can see version 1.13.6-396a8d44-szn which is just internal build from flink commit b2ca390d478aa855eb0f2028d0ed965803a98af1) What I'm trying to do: 1. create savepoint for pipeline via ./bin/flink savepoint <JOB_ID> 2. copy data under path configured in state.savepoints.dir from source s3 to new s3 3. change all configuration and restore pipeline Is this steps correct or I'm doing something wrong or unsupported? All options related to s3 have valid values for new s3 account but restore failed with exception bellow. Error message includes original path (s3://flink/savepoints/activity-searched-query) which doesn't exists on new account so that 404 is expected. But I still don't understand why flink tries that path because related config options contains new bucket info. high-availability.storageDir: 's3://<NEW_BUCKET>/ha/pipelines-runner-activity-searched-query' jobmanager.archive.fs.dir: 's3://<NEW_BUCKET>/history' state.checkpoints.dir: > 's3://<NEW_BUCKET>/checkpoints/activity-searched-query' state.savepoints.dir: > 's3://<NEW_BUCKET>/savepoints/activity-searched-query' + valid values for s3.access-key and s3.secret-key I found original path in _metadata file in savepoint data but changing that (search&replace) leads to some weird OOM, I hope this should not be needed and that values should be ignored. state.backend is hashmap if it is important. Restore back from source butcket works as expected. Thanks a lot! Regards, L. Stacktrace: 2022-03-08 15:39:25,838 [flink-akka.actor.default-dispatcher-4] INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > CombineToSearchedQuery -> (LateElementsCounter, TransformToStreamElement -> > Sink: SearchedQueryKafkaSink) (1/2) (0c0f108c393b9a5b58f861c1032671d0) > switched from INITIALIZING to FAILED on 10.67.158.155:45521-d8d19d @ > 10.67.158.155 (dataPort=36341). > org.apache.flink.util.SerializedThrowable: Exception while creating > StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at java.lang.Thread.run(Thread.java:832) ~[?:?] > Caused by: org.apache.flink.util.SerializedThrowable: Could not restore > keyed state backend for > WindowOperator_bd2a73c53230733509ca171c6476fcc5_(1/2) from any of the 1 > provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > ... 10 more > Caused by: org.apache.flink.util.SerializedThrowable: Failed when trying > to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:177) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:111) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:131) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:73) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:136) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163) > ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] > ... 10 more > Caused by: org.apache.flink.util.SerializedThrowable: > com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon > S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: > tx0000000000000011aacd6-0062276a9d-85d05-default; S3 Extended Request ID: > 85d05-default-default; Proxy: null), S3 Extended Request ID: > 85d05-default-default (Path: > s3://flink/savepoints/activity-searched-query/savepoint-ff3caa-f4b6db96b68b/000e1fc6-0ed8-452a-a8f2-57650fa0594d) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917) > ~[?:?] > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?] > [1] https://github.com/lyft/flinkk8soperator