Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-10-02 Thread Eleanore Jin
Thanks a lot for the confirmation. Eleanore On Fri, Oct 2, 2020 at 2:42 AM Chesnay Schepler wrote: > Yes, the patch call only triggers the cancellation. > You can check whether it is complete by polling the job status via > jobs/ and checking whether state is CANCELED. > > On 9/27/2020 7:02 PM,

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-10-02 Thread Chesnay Schepler
Yes, the patch call only triggers the cancellation. You can check whether it is complete by polling the job status via jobs/ and checking whether state is CANCELED. On 9/27/2020 7:02 PM, Eleanore Jin wrote: I have noticed this: if I have Thread.sleep(1500); after the patch call returned 202, t

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 Thread Eleanore Jin
I have noticed this: if I have Thread.sleep(1500); after the patch call returned 202, then the directory gets cleaned up, in the meanwhile, it shows the job-manager pod is in completed state before getting terminated: see screenshot: https://ibb.co/3F8HsvG So the patch call is async to terminate t

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-27 Thread Eleanore Jin
Hi Congxian, I am making rest call to get the checkpoint config: curl -X GET \ http://localhost:8081/jobs/d2c91a44f23efa2b6a0a89b9f1ca5a3d/checkpoints/config and here is the response: { "mode": "at_least_once", "interval": 3000, "timeout": 1, "min_pause": 1000, "max_concur

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-26 Thread Congxian Qiu
Hi Eleanore What the `CheckpointRetentionPolicy`[1] did you set for your job? if `ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION` is set, then the checkpoint will be kept when canceling a job. PS the image did not show [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/stat

Checkpoint dir is not cleaned up after cancel the job with monitoring API

2020-09-26 Thread Eleanore Jin
Hi experts, I am running flink 1.10.2 on kubernetes as per job cluster. Checkpoint is enabled, with interval 3s, minimumPause 1s, timeout 10s. I'm using FsStateBackend, snapshots are persisted to azure blob storage (Microsoft cloud storage service). Checkpointed state is just source kafka topic o