Hi Krzysztof,
TM HA is taken charge by the Flink cluster itself is beyond K8s operator's
responsibility. Flink will try to recover a failed Task as long as the restart
limit is not reached otherwise the job will transition into terminal FAILED
status. You may check the job restart strategy [1]
Hi community,
I would like to test flink k8s operator's HA capabilities for TM and JM
failover.
The simple test I did for TM failover was as follows:
- run Flink session cluster in native mode
- submit FlinkSessionJob resource with SAVEPOINT upgreade mode.
- kill task manager pod
It turns out tha
Hi Karthick,
We’ve experienced the similar issue before. What we were doing at that time was
to define multiple topics and each has a different # of partitions which means
some of the topics with more partitions will have the high parallelisms for
processing.
And you can further divide the topic
Can you provide some more context on what your Flink job will be doing?
There might be some things you can do to fix the data skew on the link
side, but first, you want to start with Kafka.
For starters, you need to better size and estimate the required number of
partitions you will need on the Kaf
Hi Gowtham i agree with you,
I'm eager to resolve the issue or gain a better understanding. Your
assistance would be greatly appreciated.
If there are any additional details or context needed to address my query
effectively, please let me know, and I'll be happy to provide them.
Thank you in adv