Re: Flink Job cluster in HA mode - recovery vs upgrade

Alexey Trenikhun Fri, 21 Aug 2020 08:52:04 -0700

Is it feasible to override ZooKeeperHaServices to recreate JobGraph from jar 
instead of reading it from ZK state. Any hints? I have feeling that reading 
JobGraph from jar is more resilient approach, less chances of mistakes during 
upgrade

Thanks,
Alexey

________________________________
From: Piotr Nowojski <pnowoj...@apache.org>
Sent: Thursday, August 20, 2020 7:04 AM
To: Chesnay Schepler <ches...@apache.org>
Cc: Alexey Trenikhun <yen...@msn.com>; Flink User Mail List 
<user@flink.apache.org>
Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade

Thank you for the clarification Chesney and sorry for the incorrect previous 
answer.

Piotrek

czw., 20 sie 2020 o 15:59 Chesnay Schepler 
<ches...@apache.org<mailto:ches...@apache.org>> napisał(a):
This is incorrect; we do store the JobGraph in ZooKeeper. If you just delete 
the deployment the cluster will recover the previous JobGraph (assuming you 
aren't changing the Zookeeper configuration).

If you wish to update the job, then you should cancel it (along with creating a 
savepoint), which will clear the Zookeeper state, and then create a new 
deployment

On 20/08/2020 15:43, Piotr Nowojski wrote:
Hi Alexey,

I might be wrong (I don't know this side of Flink very well), but as far as I 
know JobGraph is never stored in the ZK. It's always recreated from the job's 
JAR. So you should be able to upgrade the job by replacing the JAR with a newer 
version, as long as the operator UIDs are the same before and after the upgrade 
(for operator state to match before and after the upgrade).

Best, Piotrek

czw., 20 sie 2020 o 06:34 Alexey Trenikhun 
<yen...@msn.com<mailto:yen...@msn.com>> napisał(a):
Hello,

Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on 
k8s with single  JobManager and use externalized checkpoints. When JM crashes, 
k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and 
restore from latest checkpoint. Now let's say I want to upgrade job binary, I 
delete deployments, create new deployments referring to newer image, will JM 
still read JobGraph from ZK or will create new one from new job jar?

Thanks,
Alexey

Re: Flink Job cluster in HA mode - recovery vs upgrade

Reply via email to