Is it feasible to override ZooKeeperHaServices to recreate JobGraph from jar instead of reading it from ZK state. Any hints? I have feeling that reading JobGraph from jar is more resilient approach, less chances of mistakes during upgrade
Thanks, Alexey ________________________________ From: Piotr Nowojski <pnowoj...@apache.org> Sent: Thursday, August 20, 2020 7:04 AM To: Chesnay Schepler <ches...@apache.org> Cc: Alexey Trenikhun <yen...@msn.com>; Flink User Mail List <user@flink.apache.org> Subject: Re: Flink Job cluster in HA mode - recovery vs upgrade Thank you for the clarification Chesney and sorry for the incorrect previous answer. Piotrek czw., 20 sie 2020 o 15:59 Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org>> napisał(a): This is incorrect; we do store the JobGraph in ZooKeeper. If you just delete the deployment the cluster will recover the previous JobGraph (assuming you aren't changing the Zookeeper configuration). If you wish to update the job, then you should cancel it (along with creating a savepoint), which will clear the Zookeeper state, and then create a new deployment On 20/08/2020 15:43, Piotr Nowojski wrote: Hi Alexey, I might be wrong (I don't know this side of Flink very well), but as far as I know JobGraph is never stored in the ZK. It's always recreated from the job's JAR. So you should be able to upgrade the job by replacing the JAR with a newer version, as long as the operator UIDs are the same before and after the upgrade (for operator state to match before and after the upgrade). Best, Piotrek czw., 20 sie 2020 o 06:34 Alexey Trenikhun <yen...@msn.com<mailto:yen...@msn.com>> napisał(a): Hello, Let's say I run Flink Job cluster with persistent storage and Zookeeper HA on k8s with single JobManager and use externalized checkpoints. When JM crashes, k8s will restart JM pod, and JM will read JobId and JobGraph from ZK and restore from latest checkpoint. Now let's say I want to upgrade job binary, I delete deployments, create new deployments referring to newer image, will JM still read JobGraph from ZK or will create new one from new job jar? Thanks, Alexey