Just like tison has said, you could use a deployment to restart the jobmanager pod. However, if you want to make the all jobs could recover from the checkpoint, you also need to use the zookeeper and HDFS/S3 to store the high-availability data.
Also some Kubernetes native HA support is in plan[1]. After that, you will not depend on zookeeper. [1]. https://issues.apache.org/jira/browse/FLINK-12884 tison <wander4...@gmail.com> 于2020年2月10日周一 上午8:59写道: > Hi Krzysztof, > > Flink doesn't provide JM HA itself yet. > > For YARN deployment, you can rely on yarn.application-attempts > configuration[1]; > for Kubernetes deployment, Flink uses Kubernetes deployment to restart a > failed JM. > > Though, such standalone mode doesn't tolerate JM failure and strategies > above just > restart the application, which means all tasks will be killed and > restarted. > > Best, > tison. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/jobmanager_high_availability.html#configuration-1 > > > KristoffSC <krzysiek.chmielew...@gmail.com> 于2020年2月7日周五 下午11:34写道: > >> Hi, >> In [1] where we can find setup for Stand Alone an YARN clusters to achieve >> Job Manager's HA. >> >> Is Standalone Cluster High Availability with a zookeeper the same approach >> for Docker's Job Cluster approach with Kubernetes? >> >> [1] >> >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/jobmanager_high_availability.html >> >> Thanks, >> Krzysztof >> >> >> >> -- >> Sent from: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >