Hi Yang Wang,
Thanks for your reply, I MAY HAVE setup a ha cluster succefully. The reason I can't setup before may be some bug about s3 in flink, after change to hdfs,I can run it suceefully. But after about one day of running ,the job-manager will crash and can't recover automatic, I must apply the deployment of job-manager manually (and that will fix the problom,my jobs will auto start), so strange .... Since I changed too many from the yaml in flink's doc, I really don't know where is my conf is wrong.But I have add logback to flink and let it send log to my elasticsearch cluster,may the log can tell more...... ------------------ ???????? ------------------ ??????: "Yang Wang"<danrtsey...@gmail.com>; ????????: 2019??11??19??(??????) ????12:05 ??????: "vino yang"<yanghua1...@gmail.com>; ????: "Rock"<downsidem...@qq.com>;"user@flink.apache.org"<user@flink.apache.org>; ????: Re: how to setup a ha flink cluster on k8s? Hi Rock, If you want to start a ha flink cluster on k8s, the simplest way is to use ZK+HDFS/S3, just as the ha configuration on Yarn. The zookeeper-operator could help the start a zk cluster.[1] Please share more information that why it could not work. If you are using kubernetes per-job cluster, the job could be recovered when the jm pod crashed and restarted.[2] The savepoint could also be used to get better recovery. [1].https://github.com/pravega/zookeeper-operator [2].https://github.com/apache/flink/blob/release-1.9/flink-container/kubernetes/README.md#deploy-flink-job-cluster vino yang <yanghua1...@gmail.com> ??2019??11??16?????? ????5:00?????? Hi Rock, I searched by Google and found a blog[1] talk about how to config JM HA for Flink on k8s. Do not know whether it suitable for you or not. Please feel free to refer to it. Best, Vino [1]: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ Rock <downsidem...@qq.com> ??2019??11??16?????? ????11:02?????? I'm trying to setup a flink cluster on k8s for production use.But the setup here https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/kubernetes.html this not ha , when job-manager down and rescheduled the metadata for running job is lost. I tried to use ha setup for zk https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/jobmanager_high_availability.html on k8s , but can't get it right. Stroing job's metadata on k8s using pvc or other external file system should be very easy.Is there a way to achieve it.