Hi Lukas,

I am afraid you're hitting this bug: https://issues.apache.org/jira/browse/FLINK-25952

Best,

Dawid

On 08/03/2022 16:37, Lukáš Drbal wrote:
Hello everyone,

I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error.

We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can see version 1.13.6-396a8d44-szn which is just internal build from flink commit b2ca390d478aa855eb0f2028d0ed965803a98af1)

What I'm trying to do:

 1. create savepoint for pipeline via ./bin/flink savepoint <JOB_ID>
 2. copy data under path configured in state.savepoints.dir from
    source s3 to new s3
 3. change all configuration and restore pipeline

Is this steps correct or I'm doing something wrong or unsupported?

All options related to s3 have valid values for new s3 account but restore failed with exception bellow. Error message includes original path (s3://flink/savepoints/activity-searched-query) which doesn't exists on new account so that 404 is expected. But I still don't understand why flink tries that path because related config options contains new bucket info.     high-availability.storageDir: 's3://<NEW_BUCKET>/ha/pipelines-runner-activity-searched-query'

jobmanager.archive.fs.dir: 's3://<NEW_BUCKET>/history'
    state.checkpoints.dir:
's3://<NEW_BUCKET>/checkpoints/activity-searched-query'
    state.savepoints.dir:
's3://<NEW_BUCKET>/savepoints/activity-searched-query'
+ valid values for s3.access-key and s3.secret-key

I found original path in _metadata file in savepoint data but changing that (search&replace) leads to some weird OOM, I hope this should not be needed and that values should be ignored.

state.backend is hashmap if it is important.

Restore back from source butcket works as expected.

Thanks a lot!

Regards,
L.

Stacktrace:

    2022-03-08 15:39:25,838 [flink-akka.actor.default-dispatcher-4]
    INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph -
    CombineToSearchedQuery -> (LateElementsCounter,
    TransformToStreamElement -> Sink: SearchedQueryKafkaSink) (1/2)
    (0c0f108c393b9a5b58f861c1032671d0) switched from INITIALIZING to
    FAILED on 10.67.158.155:45521-d8d19d @ 10.67.158.155 (dataPort=36341).
    org.apache.flink.util.SerializedThrowable: Exception while
    creating StreamOperatorStateContext.
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at java.lang.Thread.run(Thread.java:832) ~[?:?]
    Caused by: org.apache.flink.util.SerializedThrowable: Could not
    restore keyed state backend for
    WindowOperator_bd2a73c53230733509ca171c6476fcc5_(1/2) from any of
    the 1 provided restore options.
    at
    
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    ... 10 more
    Caused by: org.apache.flink.util.SerializedThrowable: Failed when
    trying to restore heap backend
    at
    
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:177)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:111)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:131)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:73)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:136)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    at
    
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
    ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
    ... 10 more
    Caused by: org.apache.flink.util.SerializedThrowable:
    com.amazonaws.services.s3.model.AmazonS3Exception: null (Service:
    Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID:
    tx0000000000000011aacd6-0062276a9d-85d05-default; S3 Extended
    Request ID: 85d05-default-default; Proxy: null), S3 Extended
    Request ID: 85d05-default-default (Path:
    
s3://flink/savepoints/activity-searched-query/savepoint-ff3caa-f4b6db96b68b/000e1fc6-0ed8-452a-a8f2-57650fa0594d)
    at
    
com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917)
    ~[?:?]
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
    ~[?:?]


[1] https://github.com/lyft/flinkk8soperator

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to