Hi Min, I guess you use standalone high-availability and when TM fails, JM can recovered the job from an in-memory checkpoint store.
However, when JM fails, since you don't persist state on ha backend such as ZooKeeper, even JM relaunched by YARN RM superseded by a stand by, the new one knows nothing about the previous jobs. In short, you need to set up ZooKeepers as you yourself mentioned. Best, tison. Biao Liu <mmyy1...@gmail.com> 于2019年8月19日周一 下午11:49写道: > Hi Min, > > > Do I need to set up zookeepers to keep the states when a job manager > crashes? > > I guess you need to set up the HA [1] properly. Besides that, I would > suggest you should also check the state backend. > > 1. > https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html > 2. > https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Mon, 19 Aug 2019 at 23:28, <min....@ubs.com> wrote: > >> Hi, >> >> >> >> I can use check points to recover Flink states when a task manger crashes. >> >> >> >> I can not use check points to recover Flink states when a job manger >> crashes. >> >> >> >> Do I need to set up zookeepers to keep the states when a job manager >> crashes? >> >> >> >> Regards >> >> >> >> Min >> >> >> >