[ https://issues.apache.org/jira/browse/FLINK-32070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834609#comment-17834609 ]
xiaogang zhou commented on FLINK-32070: --------------------------------------- Is there any Branch I can view do a POC? And I think if you are busy on flink 2.0 state, I can also help do some work on this issue?[~Zakelly] > FLIP-306 Unified File Merging Mechanism for Checkpoints > ------------------------------------------------------- > > Key: FLINK-32070 > URL: https://issues.apache.org/jira/browse/FLINK-32070 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing, Runtime / State Backends > Reporter: Zakelly Lan > Assignee: Zakelly Lan > Priority: Major > Fix For: 1.20.0 > > > The FLIP: > [https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints] > > The creation of multiple checkpoint files can lead to a 'file flood' problem, > in which a large number of files are written to the checkpoint storage in a > short amount of time. This can cause issues in large clusters with high > workloads, such as the creation and deletion of many files increasing the > amount of file meta modification on DFS, leading to single-machine hotspot > issues for meta maintainers (e.g. NameNode in HDFS). Additionally, the > performance of object storage (e.g. Amazon S3 and Alibaba OSS) can > significantly decrease when listing objects, which is necessary for object > name de-duplication before creating an object, further affecting the > performance of directory manipulation in the file system's perspective of > view (See [hadoop-aws module > documentation|https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=an%20intermediate%20state.-,Warning%20%232%3A%20Directories%20are%20mimicked,-The%20S3A%20clients], > section 'Warning #2: Directories are mimicked'). > While many solutions have been proposed for individual types of state files > (e.g. FLINK-11937 for keyed state (RocksDB) and FLINK-26803 for channel > state), the file flood problems from each type of checkpoint file are similar > and lack systematic view and solution. Therefore, the goal of this FLIP is to > establish a unified file merging mechanism to address the file flood problem > during checkpoint creation for all types of state files, including keyed, > non-keyed, channel, and changelog state. This will significantly improve the > system stability and availability of fault tolerance in Flink. -- This message was sent by Atlassian Jira (v8.20.10#820010)