Got it. I'd say what you want to achieve is something like having a "latest-savepoint" symlink always pointing to the latest written savepoint file, and always starting from that.
To achieve this, you'd require some manual work. IIRC, you cannot set a jobID via the operator. Att, Pedro Mázala Be awesome On Thu, 12 Jun 2025 at 16:48, gustavo panizzo <g...@zumbi.com.ar> wrote: > hello > > that would indeed work, but it requires knowing in advance the last jobID > for that particular job and changing the spec submitted to the destination > cluster. we aim to have 0 touch job failover from k8s cluster to k8s cluster > > our cluster are multi node, multi az but they run critical business > process hence we want to protect against region failure > > On Thu, Jun 12, 2025, at 4:30 PM, Pedro Mázala wrote: > > Using Flink k8s operator, you may use the yaml property > job.initialSavepointPath to set a path that you want to start your pipeline > from. This would be the full path. Including the jobid. And then, you'll > have the new ID generated and such. > > To avoid maintenance issues like this one, a multi-node cluster may help > you. k8s will try to spread the deployments among the different nodes. Even > if one dies, it will make sure everything is there due to k8s desired state > mechanism. > > > > Att, > Pedro Mázala > Be awesome > > > On Thu, 12 Jun 2025 at 15:52, gustavo panizzo <g...@zumbi.com.ar> wrote: > > Hello > > I run flink (v 1.20) on k8s using the native integration and the k8s > operator (v 1.30), we keep savepoints and checkpoints in S3. > > We'd like to be able to continue running the same jobs (with the same > config, same image, using the same sink and sources, connecting to kafka > using the same credentials and groups, restoring the state from were the > previous job left) from another k8s cluster in the event of maintenance or > simply failure of the k8s cluster, hence we need to restore the state from > a savepoint or checkpoint. > > however the problem we face is that the jobID is is part of the path where > checkpoints and savepoints are stored in S3 and it is generated dynamically > every time a job (kind: flinkdeployments) is deployed into k8s > > So i cannot re create the same job in another k8s cluster to pick up where > the previous job left > > I could copy file around in S3 but feels racy and not really great, how > others move stateful jobs from k8s clusters to other k8s clusters? > > > cheers > > >