Thank you,
so in other words to have TM HA on k8s I have to configure [1] correct?

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services

niedz., 17 wrz 2023 o 07:27 Chen Zhanghao <zhanghao.c...@outlook.com>
napisał(a):

> Hi Krzysztof,
>
> TM HA is taken charge by the Flink cluster itself is beyond K8s operator's
> responsibility. Flink will try to recover a failed Task as long as the
> restart limit is not reached otherwise the job will transition into
> terminal FAILED status. You may check the job restart strategy [1] for more
> details.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/state/task_failure_recovery/#restart-strategies
>
> Best,
> Zhanghao Chen
> ------------------------------
> *发件人:* Krzysztof Chmielewski <krzysiek.chmielew...@gmail.com>
> *发送时间:* 2023年9月17日 7:58
> *收件人:* user <user@flink.apache.org>
> *主题:* HA in k8s operator
>
> Hi community,
> I would like to test flink k8s operator's HA capabilities for TM and JM
> failover.
>
> The simple test I did for TM failover was as follows:
> - run Flink session cluster in native mode
> - submit FlinkSessionJob resource with SAVEPOINT upgreade mode.
> - kill task manager pod
>
> It turns out that after I killed the TM, k8s operator does not create a
> new TM that would replace the killed one. The job was canceled and landed
> in Job Status -> Failed.
>
> I had an impression that for TM HA no extra configuration is needed.
> I have found [1] and [2]. But I'm not sure if this is for JM failvoer only
> or both, TM and JM. Also it is not clear for me if when using flink k8s
> operat do I still need to configure [1]?
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services
> [2]
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability
>
> Regards,
> Krzysztof Chmielewski
>
>
>

Reply via email to