Hi Vishal Save point with cancellation internally use /cancel REST API. Which is not stable API. It always exits with 404. Best way to issue is:
a) First issue save point REST API b) Then issue /yarn-cancel rest API( As described in http://mail-archives.apache.org/mod_mbox/flink-user/201804.mbox/%3c0ffa63f4-e6ed-42d8-1928-37a7adaaa...@apache.org%3E ) c) Then After resuming your job, provide save point Path as argument for the run jar REST API, which is returned by the (a) Above is the smoother way Regards Bhaskar On Tue, Mar 12, 2019 at 2:46 AM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > There are some issues I see and would want to get some feedback > > 1. On Cancellation With SavePoint with a Target Directory , the k8s job > does not exit ( it is not a deployment ) . I would assume that on > cancellation the jvm should exit, after cleanup etc, and thus the pod > should too. That does not happen and thus the job pod remains live. Is that > expected ? > > 2. To resume fro a save point it seems that I have to delete the job id ( > 0000000000.... ) from ZooKeeper ( this is HA ), else it defaults to the > latest checkpoint no matter what > > > I am kind of curious as to what in 1.7.2 is the tested process of > cancelling with a save point and resuming and what is the cogent story > around job id ( defaults to 000000000000.. ). Note that --job-id does not > work with 1.7.2 so even though that does not make sense, I still can not > provide a new job id. > > Regards, > > Vishal. > >