Re: Apache Flink - Question about application restart

Till Rohrmann Thu, 28 May 2020 08:33:49 -0700

Hi,

Yarn won't resubmit the job. In case of a process failure where Yarn
restarts the Flink Master, the Master will recover the submitted jobs from
a persistent storage system.


Cheers,
Till

On Thu, May 28, 2020 at 4:05 PM M Singh <mans2si...@yahoo.com> wrote:

> Hi Till/Zhu/Yang:  Thanks for your replies.
>
> So just to clarify - the job id remains same if the job restarts have not
> been exhausted.  Does Yarn also resubmit the job in case of failures and if
> so, then is the job id different.
>
> Thanks
> On Wednesday, May 27, 2020, 10:05:40 AM EDT, Till Rohrmann <
> trohrm...@apache.org> wrote:
>
>
> Hi,
>
> if you submit the same job multiple times, then it will get every time a
> different JobID assigned. For Flink, different job submissions are
> considered to be different jobs. Once a job has been submitted, it will
> keep the same JobID which is important in order to retrieve the checkpoints
> associated with this job.
>
> Cheers,
> Till
>
> On Tue, May 26, 2020 at 12:42 PM M Singh <mans2si...@yahoo.com> wrote:
>
> Hi Zhu Zhu:
>
> I have another clafication - it looks like if I run the same app multiple
> times - it's job id changes.  So it looks like even though the graph is the
> same the job id is not dependent on the job graph only since with different
> runs of the same app it is not the same.
>
> Please let me know if I've missed anything.
>
> Thanks
>
> On Monday, May 25, 2020, 05:32:39 PM EDT, M Singh <mans2si...@yahoo.com>
> wrote:
>
>
> Hi Zhu Zhu:
>
> Just to clarify - from what I understand, EMR also has by default restart
> times (I think it is 3). So if the EMR restarts the job - the job id is the
> same since the job graph is the same.
>
> Thanks for the clarification.
>
> On Monday, May 25, 2020, 04:01:17 AM EDT, Yang Wang <danrtsey...@gmail.com>
> wrote:
>
>
> Just share some additional information.
>
> When deploying Flink application on Yarn and it exhausted restart policy,
> then
> the whole application will failed. If you start another instance(Yarn
> application),
> even the high availability is configured, we could not recover from the
> latest
> checkpoint because the clusterId(i.e. applicationId) has changed.
>
>
> Best,
> Yang
>
> Zhu Zhu <reed...@gmail.com> 于2020年5月25日周一 上午11:17写道：
>
> Hi M,
>
> Regarding your questions:
> 1. yes. The id is fixed once the job graph is generated.
> 2. yes
>
> Regarding yarn mode:
> 1. the job id keeps the same because the job graph will be generated once
> at client side and persist in DFS for reuse
> 2. yes if high availability is enabled
>
> Thanks,
> Zhu Zhu
>
> M Singh <mans2si...@yahoo.com> 于2020年5月23日周六 上午4:06写道：
>
> Hi Flink Folks:
>
> If I have a Flink Application with 10 restarts, if it fails and restarts,
> then:
>
> 1. Does the job have the same id ?
> 2. Does the automatically restarting application, pickup from the last
> checkpoint ? I am assuming it does but just want to confirm.
>
> Also, if it is running on AWS EMR I believe EMR/Yarn is configured to
> restart the job 3 times (after it has exhausted it's restart policy) .  If
> that is the case:
> 1. Does the job get a new id ? I believe it does, but just want to confirm.
> 2. Does the Yarn restart honor the last checkpoint ?  I believe, it does
> not, but is there a way to make it restart from the last checkpoint of the
> failed job (after it has exhausted its restart policy) ?
>
> Thanks
>
>
>

Re: Apache Flink - Question about application restart

Reply via email to