Differentiate metrics of checkpoint size between changelog size vs materialization size

Roman Khachatryan (Jira) Thu, 21 Apr 2022 11:14:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-25470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525926#comment-17525926
 ]


Roman Khachatryan commented on FLINK-25470:
-------------------------------------------

Thanks for the analysis [~masteryhx] and sorry for the late reply.

> According to these metics, we could roughly infer:
> 1. restore time by full size of materialization part and non-materialization 
> part
For that, we need to collect metrics for the whole checkpoint, right? (which is 
non-trivial)

Or do you propose to expose on subtask level, gather via reporters, and the 
correlate metrics from different tasks by time?

 

> 2. when a checkpoint includes a new Materialization by incremetal/full size 
> of materialization part.
This shouldn't change much after FLINK-26306

> 3. the cleanup efficiency of non-materialization part by compare the full 
> size of non-materialization part which is the real size and the actual size 
> in the dfs.
I think it's better to explicitly expose cleanup-related metrics

> Add/Expose/Differentiate metrics of checkpoint size between changelog size vs 
> materialization size
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25470
>                 URL: https://issues.apache.org/jira/browse/FLINK-25470
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Metrics, Runtime / State Backends
>            Reporter: Yuan Mei
>            Assignee: Hangxiang Yu
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: Screen Shot 2021-12-29 at 1.09.48 PM.png
>
>
> FLINK-25557  only resolves part of the problems. 
> Eventually, we should answer questions:
>  * How much Data Size increases/exploding
>  * When a checkpoint includes a new Materialization
>  * Materialization size
>  * changelog sizes from the last complete checkpoint (that can roughly infer 
> restore time)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-25470) Add/Expose/Differentiate metrics of checkpoint size between changelog size vs materialization size

Reply via email to