Resume running a statefull in a different k8s cluster

gustavo panizzo Thu, 12 Jun 2025 06:48:34 -0700

Hello

I run flink (v 1.20) on k8s using the native integration and the k8s operator 
(v 1.30), we keep savepoints and checkpoints in S3.


We'd like to be able to continue running the same jobs (with the same config, 
same image, using the same sink and sources, connecting to kafka using the same 
credentials and groups, restoring the state from were the previous job left) 
from another k8s cluster in the event of maintenance or simply failure of the 
k8s cluster, hence we need to restore the state from a savepoint or checkpoint.

however the problem we face is that the jobID is is part of the path where 
checkpoints and savepoints are stored in S3 and it is generated dynamically 
every time a job (kind: flinkdeployments) is deployed into k8s 

So i cannot re create the same job in another k8s cluster to pick up where the 
previous job left

I could copy file around in S3 but feels racy and not really great, how others 
move stateful jobs from k8s clusters to other k8s clusters?


cheers

Resume running a statefull in a different k8s cluster

Reply via email to