[ 
https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803392#comment-17803392
 ] 

Rui Fan commented on FLINK-33856:
---------------------------------

Thanks [~pnowojski] for the ping.:)
{quote} [~fanrui] if I remember correctly you wanted to follow up on this?
{quote}
As I said in the mail list, I propose adding a series of TraceSpan for job 
start, such as:
 * From JobManager process is started to JobGraph is created
 * From JobGraph is created to JobMaster is created
 * From JobMaster is created to job is running
 * From start request tm from yarn or kubernetes to all tms are ready 
 * etc

And I and [~easonqin]  who my colleague created the FLIP-412: Add the 
time-consuming span of each stage when starting the Flink job to 
TraceReporter[2][3] just now to follow up it. IIUC, it's not related to this 
JIRA, right? They add the Span for different stage.

 

[1][https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4]

[2][https://cwiki.apache.org/confluence/x/8435E]

[3]https://issues.apache.org/jira/browse/FLINK-33999

> Add metrics to monitor the interaction performance between task and external 
> storage system in the process of checkpoint making
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33856
>                 URL: https://issues.apache.org/jira/browse/FLINK-33856
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.18.0
>            Reporter: Jufang He
>            Assignee: Jufang He
>            Priority: Major
>              Labels: pull-request-available
>
> When Flink makes a checkpoint, the interaction performance with the external 
> file system has a great impact on the overall time-consuming. Therefore, it 
> is easy to observe the bottleneck point by adding performance indicators when 
> the task interacts with the external file storage system. These include: the 
> rate of file write , the latency to write the file, the latency to close the 
> file.
> In flink side add the above metrics has the following advantages: convenient 
> statistical different task E2E time-consuming; do not need to distinguish the 
> type of external storage system, can be unified in the 
> FsCheckpointStreamFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to