Hi Vino,
What is the definition and difference between job cancel and job fails?
Can I say that if the program is shutdown artificially, then it is a
job cancel,
if the program is shutdown due to some error, it
is a job fail?
This is important because it is the prerequisite for the following
question:
In the document of Flink 1.6, it says:
"ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION: Retain the
checkpoint when the job is cancelled. Note that you have to manually clean up
the checkpoint state after cancellation in this case.
ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: Delete the
checkpoint when the job is cancelled. The checkpoint state will only be
available if the job fails."
But it does not says whether the checkpoint will be retained on fail.
If the checkpoint activity of fail is the same as cancel, then I have
to use RETAIL_ON_CANCELLATION, because if I do not use it, the checkpoint will
be deleted on job fail.
If the checkpoint activity of fail is not delete, then at this case it
is safe on job fail.
Best
Henry
> 在 2018年9月25日,上午11:16,vino yang <[email protected]> 写道:
>
> Hi Henry,
>
> Answer your question:
>
> What is the definition and difference between job cancel and job fails?
>
> > The cancellation and failure of the job will cause the job to enter the
> > termination state. But cancellation is artificially triggered and normally
> > terminated, while failure is usually a passive termination due to an
> > exception.
>
> If I use DELETE_ON_CANCELLATION option, in this case, does I have the
> checkpoint to resume the program?
>
> > No, if you use externalized checkpoints. you cannot resume from
> > externalized checkpoints after the job has been cancelled.
>
> I mean if I can guarantee that a savepoint can always be made before manually
> cancelation. If I use DELETE_ON_CANCELLATION option on checkpoints, is there
> any probability that I do not have a checkpoint to recover from?
>
> > From the latest source code, savepoint is not affected by
> > CheckpointRetentionPolicy, it needs to be cleaned up manually.
>
> Thanks, vino.
>
> 徐涛 <[email protected] <mailto:[email protected]>> 于2018年9月25日周二
> 上午11:06写道:
> Hi All,
> I mean if I can guarantee that a savepoint can always be made before
> manually cancelation. If I use DELETE_ON_CANCELLATION option on checkpoints,
> is there any probability that I do not have a checkpoint to recover from?
> Thank a a lot.
>
> Best
> Henry
>
>
>
>> 在 2018年9月25日,上午10:41,徐涛 <[email protected]
>> <mailto:[email protected]>> 写道:
>>
>> Hi All,
>> In flink document, it says
>> DELETE_ON_CANCELLATION: “Delete the checkpoint when the job is
>> cancelled. The checkpoint state will only be available if the job fails.”
>> What is the definition and difference between job cancel and job fails?
>> If I run the program on yarn, and after a few days, the yarn application get
>> failed for some reason.
>> If I use DELETE_ON_CANCELLATION option, in this case, does I have the
>> checkpoint to resume the program?
>>
>> If the checkpoint are only deleted when I cancel the program, I can
>> always make the savepoint before cancelation. Then it seems that I can only
>> set DELETE_ON_CANCELLATION then.
>> I can not find a case that RETAIN_ON_CANCELLATION should be used.
>>
>>
>> Best
>> Henry
>>
>