Re: Flink Job cluster in HA mode - recovery vs upgrade

Chesnay Schepler Thu, 20 Aug 2020 07:00:20 -0700

This is incorrect; we do store the JobGraph in ZooKeeper. If you justdelete the deployment the cluster will recover the previous JobGraph(assuming you aren't changing the Zookeeper configuration).

If you wish to update the job, then you should cancel it (along withcreating a savepoint), which will clear the Zookeeper state, and thencreate a new deployment


On 20/08/2020 15:43, Piotr Nowojski wrote:

Hi Alexey,
I might be wrong (I don't know this side of Flink very well), but asfar as I know JobGraph is never stored in the ZK. It's alwaysrecreated from the job's JAR. So you should be able to upgrade the jobby replacing the JAR with a newer version, as long as the operatorUIDs are the same before and after the upgrade (for operator state tomatch before and after the upgrade).
Best, Piotrek
czw., 20 sie 2020 o 06:34 Alexey Trenikhun <yen...@msn.com<mailto:yen...@msn.com>> napisał(a):
    Hello,

    Let's say I run Flink Job cluster with persistent storage and
    Zookeeper HA on k8s with single  JobManager and use externalized
    checkpoints. When JM crashes, k8s will restart JM pod, and JM will
    read JobId and JobGraph from ZK and restore from latest
    checkpoint. Now let's say I want to upgrade job binary, I delete
    deployments, create new deployments referring to newer image, will
    JM still read JobGraph from ZK or will create new one from new job
    jar?

    Thanks,
    Alexey

Re: Flink Job cluster in HA mode - recovery vs upgrade

Reply via email to