How to setup HA properly with Kubernetes Standalone Application Cluster

陳昌倬 Fri, 14 May 2021 01:22:30 -0700

Hi,

Recently, we changed our deployment to Kubernetes Standalone Application
Cluster for reactive mode. According to [0], we use Kubernetes Job with
--fromSavepoint to upgrade our application without losing state. The Job
config is identical to the one in document.


However, we found that in this setup, if there is a failure in
jobmanager, Kubernetes will restart the jobmanager with original
savepoint specific in `--fromSavepoint`, instead of the latest
checkpoint. It causes problem when it is a long running job.

Any idea for how to make Flink restoring from latest checkpoint when it
is jobmanager failure in Kubernetes Standalone Application Cluster.


[0] 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/#deploy-application-cluster


-- 
ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
http://czchen.info/
Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B

signature.asc
Description: PGP signature

How to setup HA properly with Kubernetes Standalone Application Cluster

Reply via email to