Following are the mandatory condition to run in HA: a) You should have persistent common external store for jobmanager and task managers to while writing the state b) You should have persistent external store for zookeeper to store the Jobgraph.
Zookeeper is referring path: /flink/checkpoints/submittedJobGraph480ddf9572ed to get the job graph but jobmanager unable to find it. It seems /flink/checkpoints is not the external persistent store Regards Bhaskar On Thu, Nov 28, 2019 at 10:43 AM seuzxc <xcz200...@qq.com> wrote: > hi ,I've the same problem with flink 1.9.1 , any solution to fix it > when the k8s redoploy jobmanager , the error looks like (seems zk not > remove submitted job info, but jobmanager remove the file): > > > Caused by: org.apache.flink.util.FlinkException: Could not retrieve > submitted JobGraph from state handle under > /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state > handle is broken. Try cleaning the state handle store. > at > > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662) > at > > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821) > at > > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72) > ... 9 more > Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain > block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 > file=/flink/checkpoints/submittedJobGraph480ddf9572ed > at > > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052) > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >