[ https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803392#comment-17803392 ]
Rui Fan commented on FLINK-33856: --------------------------------- Thanks [~pnowojski] for the ping.:) {quote} [~fanrui] if I remember correctly you wanted to follow up on this? {quote} As I said in the mail list, I propose adding a series of TraceSpan for job start, such as: * From JobManager process is started to JobGraph is created * From JobGraph is created to JobMaster is created * From JobMaster is created to job is running * From start request tm from yarn or kubernetes to all tms are ready * etc And I and [~easonqin] who my colleague created the FLIP-412: Add the time-consuming span of each stage when starting the Flink job to TraceReporter[2][3] just now to follow up it. IIUC, it's not related to this JIRA, right? They add the Span for different stage. [1][https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4] [2][https://cwiki.apache.org/confluence/x/8435E] [3]https://issues.apache.org/jira/browse/FLINK-33999 > Add metrics to monitor the interaction performance between task and external > storage system in the process of checkpoint making > ------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-33856 > URL: https://issues.apache.org/jira/browse/FLINK-33856 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.18.0 > Reporter: Jufang He > Assignee: Jufang He > Priority: Major > Labels: pull-request-available > > When Flink makes a checkpoint, the interaction performance with the external > file system has a great impact on the overall time-consuming. Therefore, it > is easy to observe the bottleneck point by adding performance indicators when > the task interacts with the external file storage system. These include: the > rate of file write , the latency to write the file, the latency to close the > file. > In flink side add the above metrics has the following advantages: convenient > statistical different task E2E time-consuming; do not need to distinguish the > type of external storage system, can be unified in the > FsCheckpointStreamFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010)