Hi community,
I would like to test flink k8s operator's HA capabilities for TM and JM
failover.

The simple test I did for TM failover was as follows:
- run Flink session cluster in native mode
- submit FlinkSessionJob resource with SAVEPOINT upgreade mode.
- kill task manager pod

It turns out that after I killed the TM, k8s operator does not create a new
TM that would replace the killed one. The job was canceled and landed in
Job Status -> Failed.

I had an impression that for TM HA no extra configuration is needed.
I have found [1] and [2]. But I'm not sure if this is for JM failvoer only
or both, TM and JM. Also it is not clear for me if when using flink k8s
operat do I still need to configure [1]?

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/#kubernetes-ha-services
[2]
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability

Regards,
Krzysztof Chmielewski

Reply via email to