Matthias Pohl created FLINK-26606:
-------------------------------------
Summary: CompletedCheckpoints that failed to be discarded are not
stored in the CompletedCheckpointStore
Key: FLINK-26606
URL: https://issues.apache.org/jira/browse/FLINK-26606
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Matthias Pohl
We introduced a repeatable per-job cleanup after the job reached a
globally-terminated state. It also tries to clean up the
{{CompletedCheckpointStore}}. But we missed one code path where
{{CompletedCheckpoints}} are tried to be discarded in the
{{CheckpointsCleaner}}. The {{CompletedCheckpointStore}} does not hold any
references to these {{CompletedCheckpoints}} anymore. The shutdown at the end
is not able to clean these checkpoints up.
We should not remove the {{CompletedCheckpoints}} from the
{{CompletedCheckpointStore}} if the deletion failed. This would enable us to
retry deleting these artifacts at the end of the job and consider them in the
retryable cleanup as well.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)