K8s job cluster and cancel and resume from a save point ?

Vishal Santoshi Mon, 11 Mar 2019 14:16:43 -0700

There are some issues I see and would want to get some feedback

1. On Cancellation With SavePoint with a Target Directory , the k8s  job
does not exit ( it is not a deployment ) . I would assume that on
cancellation the jvm should exit, after cleanup etc, and thus the pod
should too. That does not happen and thus the job pod remains live. Is that
expected ?


2. To resume fro a save point it seems that I have to delete the job id (
0000000000.... )  from ZooKeeper ( this is HA ), else it defaults to the
latest checkpoint no matter what


I am kind of curious as to what in 1.7.2 is the tested  process of
cancelling with a save point and resuming  and what is the cogent story
around job id ( defaults to 000000000000.. ). Note that --job-id does not
work with 1.7.2 so even though that does not make sense, I still can not
provide a new job id.

Regards,

Vishal.

K8s job cluster and cancel and resume from a save point ?

Reply via email to