There are some issues I see and would want to get some feedback 1. On Cancellation With SavePoint with a Target Directory , the k8s job does not exit ( it is not a deployment ) . I would assume that on cancellation the jvm should exit, after cleanup etc, and thus the pod should too. That does not happen and thus the job pod remains live. Is that expected ?
2. To resume fro a save point it seems that I have to delete the job id ( 0000000000.... ) from ZooKeeper ( this is HA ), else it defaults to the latest checkpoint no matter what I am kind of curious as to what in 1.7.2 is the tested process of cancelling with a save point and resuming and what is the cogent story around job id ( defaults to 000000000000.. ). Note that --job-id does not work with 1.7.2 so even though that does not make sense, I still can not provide a new job id. Regards, Vishal.