Re: Job manager failing because Flink does not find checkpoints on HDFS

Farouk Thu, 01 Aug 2019 02:50:35 -0700

Hi all

I am sorry. We found out that's it's a problem in our deployment.


The directories in Zookeeper and HDFS are not the same.

Thanks for the help

Farouk

Le jeu. 1 août 2019 à 11:38, Zhu Zhu <reed...@gmail.com> a écrit :

> Hi Farouk,
>
> This issue does not relate to checkpoints. The JM launching fails due to
> the job's user jar blob is missing on HDFS.
> Does this issue always happen? If it rarely occurs, the file might be
> unexpectedly deleted by someone else.
>
> Thanks,
> Zhu Zhu
>
> Farouk <farouk.za...@gmail.com> 于2019年8月1日周四 下午5:22写道：
>
>> Hi
>>
>> We have Flink running on Kubernetes with HDFS. The JM crashed for some
>> reasons.
>>
>> Has anybody already encounter an error like in the logfile attached ?
>>
>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>> File does not exist:
>> /projects/dev/flink-recovery/default/blob/job_c9642e4287d7075b53922fba162665d0/blob_p-0debc7cbf567a71ea6c8fc3efb5855aa6617fdea-0e5d3178b26aef6112fed55559d41634
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2025)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1996)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
>>
>> Thanks
>> Farouk
>>
>

Re: Job manager failing because Flink does not find checkpoints on HDFS

Reply via email to