Is it filesystem or hadoop? If its NAS then why the exception "Caused by: org.apache.hadoop.hdfs.BlockMissingException: " It seems you configured hadoop state store and giving NAS mount.
Regards Bhaskar On Thu, Nov 28, 2019 at 11:36 AM 曾祥才 <xcz200...@qq.com> wrote: > /flink/checkpoints is a external persistent store (a nas directory mounts > to the job manager) > > > > > ------------------ 原始邮件 ------------------ > *发件人:* "Vijay Bhaskar"<bhaskar.eba...@gmail.com>; > *发送时间:* 2019年11月28日(星期四) 下午2:29 > *收件人:* "曾祥才"<xcz200...@qq.com>; > *抄送:* "user"<user@flink.apache.org>; > *主题:* Re: JobGraphs not cleaned up in HA mode > > Following are the mandatory condition to run in HA: > > a) You should have persistent common external store for jobmanager and > task managers to while writing the state > b) You should have persistent external store for zookeeper to store the > Jobgraph. > > Zookeeper is referring path: > /flink/checkpoints/submittedJobGraph480ddf9572ed to get the job graph but > jobmanager unable to find it. > It seems /flink/checkpoints is not the external persistent store > > > Regards > Bhaskar > > On Thu, Nov 28, 2019 at 10:43 AM seuzxc <xcz200...@qq.com> wrote: > >> hi ,I've the same problem with flink 1.9.1 , any solution to fix it >> when the k8s redoploy jobmanager , the error looks like (seems zk not >> remove submitted job info, but jobmanager remove the file): >> >> >> Caused by: org.apache.flink.util.FlinkException: Could not retrieve >> submitted JobGraph from state handle under >> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state >> handle is broken. Try cleaning the state handle store. >> at >> >> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662) >> at >> >> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821) >> at >> >> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72) >> ... 9 more >> Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain >> block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 >> file=/flink/checkpoints/submittedJobGraph480ddf9572ed >> at >> >> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052) >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >