Hi Dawid, I just tried the same steps on flink builded from git branch release-1.13 and everything works as expected!
Thank you all! L. On Wed, Mar 9, 2022 at 8:49 AM Dawid Wysakowicz <dwysakow...@apache.org> wrote: > Hi Lukas, > > I am afraid you're hitting this bug: > https://issues.apache.org/jira/browse/FLINK-25952 > > Best, > > Dawid > On 08/03/2022 16:37, Lukáš Drbal wrote: > > Hello everyone, > > I'm trying to move savepoint to another s3 account but restore always > failed with some weird 404 error. > > We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can > see version 1.13.6-396a8d44-szn which is just internal build from flink > commit b2ca390d478aa855eb0f2028d0ed965803a98af1) > > What I'm trying to do: > > 1. create savepoint for pipeline via ./bin/flink savepoint <JOB_ID> > 2. copy data under path configured in state.savepoints.dir from source > s3 to new s3 > 3. change all configuration and restore pipeline > > Is this steps correct or I'm doing something wrong or unsupported? > > All options related to s3 have valid values for new s3 account but restore > failed with exception bellow. Error message includes original path > (s3://flink/savepoints/activity-searched-query) which doesn't exists on new > account so that 404 is expected. But I still don't understand why flink > tries that path because related config options contains new bucket info. > high-availability.storageDir: > 's3://<NEW_BUCKET>/ha/pipelines-runner-activity-searched-query' > > jobmanager.archive.fs.dir: 's3://<NEW_BUCKET>/history' > > state.checkpoints.dir: >> 's3://<NEW_BUCKET>/checkpoints/activity-searched-query' > > state.savepoints.dir: >> 's3://<NEW_BUCKET>/savepoints/activity-searched-query' > > > + valid values for s3.access-key and s3.secret-key > > I found original path in _metadata file in savepoint data but changing > that (search&replace) leads to some weird OOM, I hope this should not be > needed and that values should be ignored. > > state.backend is hashmap if it is important. > > Restore back from source butcket works as expected. > > Thanks a lot! > > Regards, > L. > > Stacktrace: > > 2022-03-08 15:39:25,838 [flink-akka.actor.default-dispatcher-4] INFO >> org.apache.flink.runtime.executiongraph.ExecutionGraph - >> CombineToSearchedQuery -> (LateElementsCounter, TransformToStreamElement -> >> Sink: SearchedQueryKafkaSink) (1/2) (0c0f108c393b9a5b58f861c1032671d0) >> switched from INITIALIZING to FAILED on 10.67.158.155:45521-d8d19d @ >> 10.67.158.155 (dataPort=36341). >> org.apache.flink.util.SerializedThrowable: Exception while creating >> StreamOperatorStateContext. >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at java.lang.Thread.run(Thread.java:832) ~[?:?] >> Caused by: org.apache.flink.util.SerializedThrowable: Could not restore >> keyed state backend for >> WindowOperator_bd2a73c53230733509ca171c6476fcc5_(1/2) from any of the 1 >> provided restore options. >> at >> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> ... 10 more >> Caused by: org.apache.flink.util.SerializedThrowable: Failed when trying >> to restore heap backend >> at >> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:177) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:111) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:131) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:73) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:136) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> at >> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163) >> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn] >> ... 10 more >> Caused by: org.apache.flink.util.SerializedThrowable: >> com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon >> S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: >> tx0000000000000011aacd6-0062276a9d-85d05-default; S3 Extended Request ID: >> 85d05-default-default; Proxy: null), S3 Extended Request ID: >> 85d05-default-default (Path: >> s3://flink/savepoints/activity-searched-query/savepoint-ff3caa-f4b6db96b68b/000e1fc6-0ed8-452a-a8f2-57650fa0594d) >> at >> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917) >> ~[?:?] >> at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?] >> > > [1] https://github.com/lyft/flinkk8soperator > >