mazhenzz created FLINK-34197:
--------------------------------

             Summary: How can i recover job by savepoint with multi-job run by 
executeAsync in application mode
                 Key: FLINK-34197
                 URL: https://issues.apache.org/jira/browse/FLINK-34197
             Project: Flink
          Issue Type: Technical Debt
          Components: API / Core
    Affects Versions: 1.18.1
            Reporter: mazhenzz


Hello guys, i'm working on flink java with 1.18 version, and want to use 
Application-mode to run 2 jobs in one pod(k8s docker deployment).

In java code, i use a _for_ statement to create 2 or more jobs with 
env.executeAsync, creating a new env in loop clause. Thus we can run multi 
parallel job in one docker pod, to reduce resource cost.

In application-mode, i think i cannot take over recovery with checkpoint, 
because we cannot enable HA in this mode, thus we cannot store the previous job 
id in Zookeeper to recover from checkpoint. Ref: 
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/overview/#application-mode

So i want to recover by savepoint, when the docker pod is down or need to 
restart. My problems are:
 * how can i trigger savepoint for each job (now i run 2 jobs in one pod) every 
hour?
 * how can i recover from savepoint for each job when the docker pod restart?

with java code or REST api.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to