Zakelly commented on code in PR #24766:
URL: https://github.com/apache/flink/pull/24766#discussion_r1602564828


##########
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -292,4 +292,26 @@ The final checkpoint would be triggered immediately after 
all operators have rea
 without waiting for periodic triggering, but the job will need to wait for 
this final checkpoint 
 to be completed.
 
+## Unify file merging mechanism for checkpoints (Experimental)
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature, 
+which allows scattered small checkpoint files to be written into larger files, 
reducing the number of file creations 
+and file deletions, which alleviates the pressure of file system metadata 
management raised by the file flooding problem during checkpoints.
+The mechanism can be enabled by setting 
`state.checkpoints.file-merging.enabled` to `true`.
+**Note** that as a trade-off, enabling this mechanism may lead to space 
amplification, that is, the actual occupation on the file system
+will be larger than actual state size. 
`state.checkpoints.file-merging.max-space-amplification` 
+can be used to limit the upper bound of space amplification.
+
+This mechanism is applicable to keyed state, operator state and channel state 
in Flink. Merging at subtask level is 
+provided for shared scope state; Merging at TaskManager level is provided for 
private scope state. The maximum number of subtasks
+allowed to be written to a single file can be configured through the 
`state.checkpoints.file-merging.max-subtasks-per-file` option.
+
+This feature also supports merging files across checkpoints. To enable this, 
set
+`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.
+
+This mechanism introduces a file pool to handle concurrent writing scenarios. 
There are two modes, the non-blocking mode will 
+always provide usable physical file without blocking when receive a file 
request, it may create many physical files if poll 
+file frequently; while the blocking mode will be blocked until there are 
returned files available in the file pool. This can be configured via 
+`state.checkpoints.file-merging.pool-blocking`.

Review Comment:
   ```suggestion
   file frequently; while the blocking mode will be blocked until there are 
returned files available in the file pool. This can be configured via
   setting `state.checkpoints.file-merging.pool-blocking` as `true` for 
blocking or `false` for non-blocking.
   ```



##########
docs/content.zh/docs/dev/datastream/fault-tolerance/checkpointing.md:
##########
@@ -250,5 +250,20 @@ StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironm
 需要注意的是,这一行为可能会延长任务运行的时间,如果 checkpoint 周期比较大,这一延迟会非常明显。
 极端情况下,如果 checkpoint 的周期被设置为 `Long.MAX_VALUE`,那么任务永远不会结束,因为下一次 checkpoint 不会进行。
 
+##  统一的 checkpoint 文件合并机制 (实验性功能)
+
+Flink 1.20 引入了 MVP 版本的统一 checkpoint 文件合并机制,该机制允许把分散的 checkpoint 小文件合并到大文件中,减少 
checkpoint 文件创建删除的次数,
+有助于减轻文件过多问题带来的文件系统元数据管理的压力。可以通过将 `state.checkpoints.file-merging.enabled` 设置为 
`true` 来开启该机制。
+**注意**,考虑 trade-off,开启该机制会导致空间放大,即文件系统上的实际占用会比 state size 更大,可以通过设置 
`state.checkpoints.file-merging.max-space-amplification`
+来控制文件放大的上限。
+
+该机制适用于 Flink 中的 keyed state、operator state 和 channel state。对 shared scope 
state 
+提供 subtask 级别的合并;对 private scope state 提供 TaskManager 级别的合并,可以通过
+ `state.checkpoints.file-merging.max-subtasks-per-file` 选项配置单个文件允许写入的最大 
subtask 数目。
+
+统一文件合并机制也支持跨 checkpoint 的文件合并,通过设置 
`state.checkpoints.file-merging.across-checkpoint-boundary` 为 `true` 开启。
+该机制引入了文件池用于处理并发写的场景,文件池有两种模式,Non-blocking 
模式的文件池会对每个文件请求即时返回一个物理文件,在频繁请求的情况下会创建出许多物理文件;而

Review Comment:
   Add an empty line above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to