linguoxuan created FLINK-38842:
----------------------------------
Summary: FileSink may leave orphaned temporary files in COS after
restoring from a checkpoint, due to missing cleanup logic for temporary files
in the checkpointed state.
Key: FLINK-38842
URL: https://issues.apache.org/jira/browse/FLINK-38842
Project: Flink
Issue Type: Bug
Components: Connectors / FileSystem
Affects Versions: 2.2.0, 2.1.0, 1.20.0, 1.19.0, 1.18.0, 1.17.0, 1.16.0,
1.15.0, 2.0.0, 1.14.0, 1.13.0
Environment: * Flink Version: 1.16.1
* Storage: Tencent Cloud Object Storage (COS). The issue is storage-agnostic
and should affect all filesystems used with {{{}FileSink{}}}.
Reporter: linguoxuan
When restoring a FileSink job from a checkpoint, the temporary files previously
written to COS are re-read and processed correctly. However, unlike the legacy
StreamFileSink, the current FileSink implementation does not mark and delete
temporary files recorded in the checkpointed state. This results in two
temporary files appearing in COS after restore, and one of them may never be
cleaned up, leaving orphaned files in the storage system permanently.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)