Hi Dawid,

I just tried the same steps on flink builded from git branch release-1.13
and everything works as expected!

Thank you all!

L.

On Wed, Mar 9, 2022 at 8:49 AM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

> Hi Lukas,
>
> I am afraid you're hitting this bug:
> https://issues.apache.org/jira/browse/FLINK-25952
>
> Best,
>
> Dawid
> On 08/03/2022 16:37, Lukáš Drbal wrote:
>
> Hello everyone,
>
> I'm trying to move savepoint to another s3 account but restore always
> failed with some weird 404 error.
>
> We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can
> see version 1.13.6-396a8d44-szn which is just internal build from flink
> commit b2ca390d478aa855eb0f2028d0ed965803a98af1)
>
> What I'm trying to do:
>
>    1. create savepoint for pipeline via ./bin/flink savepoint <JOB_ID>
>    2. copy data under path configured in state.savepoints.dir from source
>    s3 to new s3
>    3. change all configuration and restore pipeline
>
> Is this steps correct or I'm doing something wrong or unsupported?
>
> All options related to s3 have valid values for new s3 account but restore
> failed with exception bellow. Error message includes original path
> (s3://flink/savepoints/activity-searched-query) which doesn't exists on new
> account so that 404 is expected. But I still don't understand why flink
> tries that path because related config options contains new bucket info.
>     high-availability.storageDir:
> 's3://<NEW_BUCKET>/ha/pipelines-runner-activity-searched-query'
>
> jobmanager.archive.fs.dir: 's3://<NEW_BUCKET>/history'
>
> state.checkpoints.dir:
>> 's3://<NEW_BUCKET>/checkpoints/activity-searched-query'
>
> state.savepoints.dir:
>> 's3://<NEW_BUCKET>/savepoints/activity-searched-query'
>
>
> + valid values for s3.access-key and s3.secret-key
>
> I found original path in _metadata file in savepoint data but changing
> that (search&replace) leads to some weird OOM, I hope this should not be
> needed and that values should be ignored.
>
> state.backend is hashmap if it is important.
>
> Restore back from source butcket works as expected.
>
> Thanks a lot!
>
> Regards,
> L.
>
> Stacktrace:
>
> 2022-03-08 15:39:25,838 [flink-akka.actor.default-dispatcher-4] INFO
>>  org.apache.flink.runtime.executiongraph.ExecutionGraph -
>> CombineToSearchedQuery -> (LateElementsCounter, TransformToStreamElement ->
>> Sink: SearchedQueryKafkaSink) (1/2) (0c0f108c393b9a5b58f861c1032671d0)
>> switched from INITIALIZING to FAILED on 10.67.158.155:45521-d8d19d @
>> 10.67.158.155 (dataPort=36341).
>> org.apache.flink.util.SerializedThrowable: Exception while creating
>> StreamOperatorStateContext.
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at java.lang.Thread.run(Thread.java:832) ~[?:?]
>> Caused by: org.apache.flink.util.SerializedThrowable: Could not restore
>> keyed state backend for
>> WindowOperator_bd2a73c53230733509ca171c6476fcc5_(1/2) from any of the 1
>> provided restore options.
>> at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> ... 10 more
>> Caused by: org.apache.flink.util.SerializedThrowable: Failed when trying
>> to restore heap backend
>> at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:177)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:111)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:131)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:73)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:136)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
>> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
>> ... 10 more
>> Caused by: org.apache.flink.util.SerializedThrowable:
>> com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon
>> S3; Status Code: 404; Error Code: NoSuchBucket; Request ID:
>> tx0000000000000011aacd6-0062276a9d-85d05-default; S3 Extended Request ID:
>> 85d05-default-default; Proxy: null), S3 Extended Request ID:
>> 85d05-default-default (Path:
>> s3://flink/savepoints/activity-searched-query/savepoint-ff3caa-f4b6db96b68b/000e1fc6-0ed8-452a-a8f2-57650fa0594d)
>> at
>> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917)
>> ~[?:?]
>> at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?]
>>
>
> [1] https://github.com/lyft/flinkk8soperator
>
>

Reply via email to