I think your idea is OK. You have 2 options, generally. You can launch application cluster(s), which do linger around when the application is finished. And you get the idea: to read the status, logs, etc. That is what you are doing right now.
The other option is to have a session cluster which will run jobs on demand. You will need an external mechanism to deploy jobs on the cluster at your chosen time. If you choose Kubernetes Native resource provisioning, your Task managers will be created or removed as your load fluctuates. Job Manager will stay up all the time. Nix,. From: Rainer Schamm <rai...@lsdopen.io> Date: Friday, March 7, 2025 at 2:23 PM To: user@flink.apache.org <user@flink.apache.org> Subject: Kubernetes FlinkDeployments and Batch Jobs Hi all I am struggling a bit with the intended workflow around FlinkDeployments and Batch Jobs via the Flink Kubernetes Operator. What I found is that when I create a new FlinkDeployment (from a yaml spec) for a batch type job then it: 1. Starts both a JobManager pod and a Taskmanager pod (or many pods). 2. After the batch job finished successfully the TaskManager pods shuts down; and the JobManager stays alive. 3. This allows one to query the job status from the job manager REST API. But now I am not sure how best to go about rescheduling the job, for example to run once a day... The best I can come up with is to delete the actual FlinkDeployment once I have determined the job status, and then to recreate the FlinkDeployment in some scheduled way,,, Does anyone have any suggestions? Regards Rainer