Thank you, so in other words to have TM HA on k8s I have to configure [1] correct?
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services niedz., 17 wrz 2023 o 07:27 Chen Zhanghao <zhanghao.c...@outlook.com> napisał(a): > Hi Krzysztof, > > TM HA is taken charge by the Flink cluster itself is beyond K8s operator's > responsibility. Flink will try to recover a failed Task as long as the > restart limit is not reached otherwise the job will transition into > terminal FAILED status. You may check the job restart strategy [1] for more > details. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/state/task_failure_recovery/#restart-strategies > > Best, > Zhanghao Chen > ------------------------------ > *发件人:* Krzysztof Chmielewski <krzysiek.chmielew...@gmail.com> > *发送时间:* 2023年9月17日 7:58 > *收件人:* user <user@flink.apache.org> > *主题:* HA in k8s operator > > Hi community, > I would like to test flink k8s operator's HA capabilities for TM and JM > failover. > > The simple test I did for TM failover was as follows: > - run Flink session cluster in native mode > - submit FlinkSessionJob resource with SAVEPOINT upgreade mode. > - kill task manager pod > > It turns out that after I killed the TM, k8s operator does not create a > new TM that would replace the killed one. The job was canceled and landed > in Job Status -> Failed. > > I had an impression that for TM HA no extra configuration is needed. > I have found [1] and [2]. But I'm not sure if this is for JM failvoer only > or both, TM and JM. Also it is not clear for me if when using flink k8s > operat do I still need to configure [1]? > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services > [2] > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability > > Regards, > Krzysztof Chmielewski > > >