I'm looking at the job cluster mode, it looks great and I and considering migrating our jobs off our "legacy" session cluster and into Kubernetes.

I do need to ask some questions because I haven't found a lot of details in the documentation about how it works yet, and I gave up following the the DI around in the code after a while.

Let's say I have a deployment for the job "leader" in HA with ZK, and another deployment for the taskmanagers.

I want to upgrade the code or configuration and start from a savepoint, in an automated way.

Best I can figure, I can not just update the deployment resources in kubernetes and allow the containers to restart in an arbitrary order.

Instead, I expect sequencing is important, something along the lines of this:

1. issue savepoint command on leader
2. wait for savepoint
3. destroy all leader and taskmanager containers
4. deploy new leader, with savepoint url
5. deploy new taskmanagers


For example, I imagine old taskmanagers (with an old version of my job) attaching to the new leader and causing a problem.

Does that sound right, or am I overthinking it?

If not, has anyone tried implementing any automation for this yet?

Reply via email to