Hi all I am sorry. We found out that's it's a problem in our deployment.
The directories in Zookeeper and HDFS are not the same. Thanks for the help Farouk Le jeu. 1 août 2019 à 11:38, Zhu Zhu <reed...@gmail.com> a écrit : > Hi Farouk, > > This issue does not relate to checkpoints. The JM launching fails due to > the job's user jar blob is missing on HDFS. > Does this issue always happen? If it rarely occurs, the file might be > unexpectedly deleted by someone else. > > Thanks, > Zhu Zhu > > Farouk <farouk.za...@gmail.com> 于2019年8月1日周四 下午5:22写道: > >> Hi >> >> We have Flink running on Kubernetes with HDFS. The JM crashed for some >> reasons. >> >> Has anybody already encounter an error like in the logfile attached ? >> >> Caused by: java.lang.Exception: Cannot set up the user code libraries: >> File does not exist: >> /projects/dev/flink-recovery/default/blob/job_c9642e4287d7075b53922fba162665d0/blob_p-0debc7cbf567a71ea6c8fc3efb5855aa6617fdea-0e5d3178b26aef6112fed55559d41634 >> at >> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) >> at >> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909) >> >> Thanks >> Farouk >> >