Re: Recovery from job manager crash using check points

Zili Chen Mon, 19 Aug 2019 19:17:26 -0700

Hi Min,

I guess you use standalone high-availability and when TM fails,
JM can recovered the job from an in-memory checkpoint store.


However, when JM fails, since you don't persist state on ha backend
such as ZooKeeper, even JM relaunched by YARN RM superseded by a
stand by, the new one knows nothing about the previous jobs.

In short, you need to set up ZooKeepers as you yourself mentioned.

Best,
tison.


Biao Liu <mmyy1...@gmail.com> 于2019年8月19日周一 下午11:49写道：

> Hi Min,
>
> > Do I need to set up zookeepers to keep the states when a job manager
> crashes?
>
> I guess you need to set up the HA [1] properly. Besides that, I would
> suggest you should also check the state backend.
>
> 1.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html
> 2.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Mon, 19 Aug 2019 at 23:28, <min....@ubs.com> wrote:
>
>> Hi,
>>
>>
>>
>> I can use check points to recover Flink states when a task manger crashes.
>>
>>
>>
>> I can not use check points to recover Flink states when a job manger
>> crashes.
>>
>>
>>
>> Do I need to set up zookeepers to keep the states when a job manager
>> crashes?
>>
>>
>>
>> Regards
>>
>>
>>
>> Min
>>
>>
>>
>

Re: Recovery from job manager crash using check points

Reply via email to