Hi there, We use the Flink Kubernetes Operator, and I am investigating how we can easily support failing over a FlinkDeployment from one Kubernetes Cluster to another in the case of an outage that requires us to migrate a large number of FlinkDeployments from one K8s cluster to another.
I understand one way to do this is to set `initialSavepoint` on all the FlinkDeployments to the most recent/appropriate snapshot so the jobs continue from where they left off, but for a large number of jobs, this would be quite a bit of manual labor. Do others have an approach they are using? Any advice? Could this be something addressed in a future FLIP? Perhaps we could store some kind of metadata in object storage so that the Flink Kubernetes Operator can restore a FlinkDeployment from where it left off, even if the job is shifted to another Kubernetes Cluster. Looking forward to hearing folks' thoughts!