Hi, I'm running some jobs using native Kubernetes. Sometimes, for some unrelated issue with our K8s cluster (e.g: K8s node crashed), my Flink pods are gone. The JM pod, as it is deployed using a deployment, will be re-created automatically. However, all of my jobs are lost. What I have to do now are: 1. Re-upload the jars 2. Find the path to the last checkpoint of each job 3. Resubmit the job
Is there any existing option to automate those steps? E.g. 1. Can I use a jar file stored in the JM's file system or on S3 instead of uploading the jar file via REST interface? 2. When restoring the job, I need to provide the full path of the last checkpoint (/s3://<base_path>/<prev_job_id>/chk-2345//). Is there any option to just provide the base_path? 3. Store the info to restore the jobs in the K8s deployment config Thanks a lot. Regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/