[ https://issues.apache.org/jira/browse/FLINK-25470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525926#comment-17525926 ]
Roman Khachatryan commented on FLINK-25470: ------------------------------------------- Thanks for the analysis [~masteryhx] and sorry for the late reply. > According to these metics, we could roughly infer: > 1. restore time by full size of materialization part and non-materialization > part For that, we need to collect metrics for the whole checkpoint, right? (which is non-trivial) Or do you propose to expose on subtask level, gather via reporters, and the correlate metrics from different tasks by time? > 2. when a checkpoint includes a new Materialization by incremetal/full size > of materialization part. This shouldn't change much after FLINK-26306 > 3. the cleanup efficiency of non-materialization part by compare the full > size of non-materialization part which is the real size and the actual size > in the dfs. I think it's better to explicitly expose cleanup-related metrics > Add/Expose/Differentiate metrics of checkpoint size between changelog size vs > materialization size > -------------------------------------------------------------------------------------------------- > > Key: FLINK-25470 > URL: https://issues.apache.org/jira/browse/FLINK-25470 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics, Runtime / State Backends > Reporter: Yuan Mei > Assignee: Hangxiang Yu > Priority: Major > Fix For: 1.16.0 > > Attachments: Screen Shot 2021-12-29 at 1.09.48 PM.png > > > FLINK-25557 only resolves part of the problems. > Eventually, we should answer questions: > * How much Data Size increases/exploding > * When a checkpoint includes a new Materialization > * Materialization size > * changelog sizes from the last complete checkpoint (that can roughly infer > restore time) > > -- This message was sent by Atlassian Jira (v8.20.7#820007)