zoltar9264 commented on code in PR #21822:
URL: https://github.com/apache/flink/pull/21822#discussion_r1170713429


##########
flink-dstl/flink-dstl-dfs/src/main/java/org/apache/flink/changelog/fs/DuplicatingStateChangeFsUploader.java:
##########
@@ -51,14 +52,15 @@
  *       <li>Store the meta of files into {@link ChangelogTaskLocalStateStore} 
by
  *           AsyncCheckpointRunnable#reportCompletedSnapshotStates().
  *       <li>Pass control of the file to {@link 
LocalChangelogRegistry#register} when
- *           ChangelogKeyedStateBackend#notifyCheckpointComplete() , files of 
the previous
- *           checkpoint will be deleted by {@link 
LocalChangelogRegistry#discardUpToCheckpoint} at
- *           the same time.
+ *           FsStateChangelogWriter#persist , files of the previous checkpoint 
will be deleted by
+ *           {@link LocalChangelogRegistry#discardUpToCheckpoint} when the 
previous checkpoint is
+ *           confirmed.

Review Comment:
   The problem of not cleaning up useless local files after truncate was not 
actively addressed before this PR. The previous cleaning method of 
_LocalChangelogRegistryImpl#prune()_ can indeed avoid the accumulation of local 
changelog files, but this cleaning method itself is problematic because it does 
not consider that these files will continue to be used by subsequent 
checkpoints. Before this PR, the remote changelog file did not consider the 
cleanup problem when the checkpoint cannot be made successfully. The local 
changelog file just happens to avoid this problem due to the wrong way to clean 
up the file. So I don't think it can simply be considered that this PR has 
brought new problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to