Hi Miki,

it looks as if you did not submit a job to the cluster of which you shared
the logs. At least I could not see a submit job call.

Cheers,
Till

On Mon, Jun 4, 2018 at 12:31 PM miki haiat <miko5...@gmail.com> wrote:

> HI Till,
> Iv`e managed to do  reproduce it.
> Full log faild_jm.log
> <https://gist.githubusercontent.com/miko-code/e634164404354c4c590be84292fd8cb2/raw/baeee310cd50cfa79303b328e3334d960c8e98e6/faild_jm.log>
>
>
>
>
> On Mon, Jun 4, 2018 at 10:33 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hmmm, Flink should not delete the stored blobs on the HA storage. Could
>> you try to reproduce the problem and then send us the logs on DEBUG level?
>> Please also check before shutting the cluster down, that the files were
>> there.
>>
>> Cheers,
>> Till
>>
>> On Sun, Jun 3, 2018 at 1:10 PM miki haiat <miko5...@gmail.com> wrote:
>>
>>> Hi  Till ,
>>>
>>>    1. the files are not longer exist in HDFS.
>>>    2. yes , stop and start the cluster from the bin commands.
>>>    3.  unfortunately i deleted the log.. :(
>>>
>>>
>>> I wondered if this code could cause this issue , the way in using
>>> checkpoint
>>>
>>> StateBackend sb = new 
>>> FsStateBackend("hdfs://***/flink/my_city/checkpoints");
>>> env.setStateBackend(sb);
>>> env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
>>> env.getCheckpointConfig().setCheckpointInterval(60000);
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 6:19 PM Till Rohrmann <trohrm...@apache.org>
>>> wrote:
>>>
>>>> Hi Miki,
>>>>
>>>> could you check whether the files are really no longer stored on HDFS?
>>>> How did you terminate the cluster? Simply calling `bin/stop-cluster.sh`? I
>>>> just tried it locally and it could recover the job after calling
>>>> `bin/start-cluster.sh` again.
>>>>
>>>> What would be helpful are the logs from the initial run of the job. So
>>>> if you can reproduce the problem, then this log would be very helpful.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Thu, May 31, 2018 at 6:14 PM, miki haiat <miko5...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Im having some wierd issue with the JM recovery ,
>>>>> I using HDFS and ZOOKEEPER for HA stand alone cluster .
>>>>>
>>>>> Iv  stop the cluster change some parameters in the flink conf (Memory).
>>>>> But now when i start the cluster again im having an error that
>>>>> preventing from JM to start.
>>>>> somehow the checkpoint file doesn't exists in HDOOP  and JM wont start
>>>>> .
>>>>>
>>>>> full log JM log file
>>>>> <https://gist.github.com/miko-code/28d57b32cb9c4f1aa96fa9873e10e53c>
>>>>>
>>>>>
>>>>>> 2018-05-31 11:57:05,568 ERROR
>>>>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
>>>>>> occurred in the cluster entrypoint.
>>>>>
>>>>> Caused by: java.lang.Exception: Cannot set up the user code libraries:
>>>>> File does not exist:
>>>>> /flink1.5/ha/default/blob/job_5c545fc3f43d69325fb9966b8dd4c8f3/blob_p-5d9f3be555d3b05f90b5e148235d25730eb65b3d-ae486e221962f7b96e36da18fe1c57ca
>>>>> at
>>>>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>

Reply via email to