[ https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800683#comment-17800683 ]
Hangxiang Yu commented on FLINK-33856: -------------------------------------- [~Weijie Guo] Thanks for pinging me here. [~hejufang001] Thanks for the proposal. I think these metrics sound reasonable. IIUC, they are checkpoint related task-level metrics. I think we could use TraceReporter provided by FLINK-33695 but not use current MetricReporter as you could see the reaon mentioned in FLINK-33695 cc [~pnowojski] > Add metrics to monitor the interaction performance between task and external > storage system in the process of checkpoint making > ------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-33856 > URL: https://issues.apache.org/jira/browse/FLINK-33856 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.18.0 > Reporter: Jufang He > Assignee: Jufang He > Priority: Major > Labels: pull-request-available > > When Flink makes a checkpoint, the interaction performance with the external > file system has a great impact on the overall time-consuming. Therefore, it > is easy to observe the bottleneck point by adding performance indicators when > the task interacts with the external file storage system. These include: the > rate of file write , the latency to write the file, the latency to close the > file. > In flink side add the above metrics has the following advantages: convenient > statistical different task E2E time-consuming; do not need to distinguish the > type of external storage system, can be unified in the > FsCheckpointStreamFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010)