[ https://issues.apache.org/jira/browse/FLINK-18263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266725#comment-17266725 ]
Congxian Qiu commented on FLINK-18263: -------------------------------------- Seems there is a related [mail list|http://apache-flink.147419.n8.nabble.com/Flink-checkpoint-td10186.html] with this issue > Allow external checkpoints to be persisted even when the job is in "Finished" > state. > ------------------------------------------------------------------------------------ > > Key: FLINK-18263 > URL: https://issues.apache.org/jira/browse/FLINK-18263 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Reporter: Mark Cho > Priority: Major > Labels: pull-request-available > > Currently, `execution.checkpointing.externalized-checkpoint-retention` > configuration supports two options: > - `DELETE_ON_CANCELLATION` which keeps the externalized checkpoints in FAILED > and SUSPENDED state. > - `RETAIN_ON_CANCELLATION` which keeps the externalized checkpoints in > FAILED, SUSPENDED, and CANCELED state. > This gives us control over the retention of externalized checkpoints in all > terminal state of a job, except for the FINISHED state. > If the job ends up in "FINISHED" state, externalized checkpoints will be > automatically cleaned up and there currently is no config that will ensure > that these externalized checkpoints to be persisted. > I found an old Jira ticket FLINK-4512 where this was discussed. I think it > would be helpful to have a config that can control the retention policy for > FINISHED state as well. > - This can be useful for cases where we want to rewind a job (that reached > the FINISHED state) to a previous checkpoint. > - When we use externalized checkpoints, we want to fully delegate the > checkpoint clean-up to an external process in all job states (without > cherrypicking FINISHED state to be cleaned up by Flink). > We have a quick fix working in our fork where we've changed > `ExternalizedCheckpointCleanup` enum: > {code:java} > RETAIN_ON_FAILURE (renamed from DELETE_ON_CANCELLATION; retains on FAILED) > RETAIN_ON_CANCELLATION (kept the same; retains on FAILED, CANCELED) > RETAIN_ON_SUCCESS (added; retains on FAILED, CANCELED, FINISHED) > {code} > Since this change requires changes to multiple components (e.g. config > values, REST API, Web UI, etc), I wanted to get the community's thoughts > before I invest more time in my quick fix PR (which currently only contains > minimal change to get this working). -- This message was sent by Atlassian Jira (v8.3.4#803005)