[ https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803478#comment-17803478 ]
Piotr Nowojski commented on FLINK-33856: ---------------------------------------- {quote} Maybe a new flip that supports task-level trace reporter can builded ? I’m willing to participate in the development. {quote} Please again check the FLIP-384 discussions. I was highlighting there a couple of difficulties: {quote} However, if we would like to create true distributed traces, with spans reported from many different components, potentially both on JM and TM, the problem is a bit deeper. The issue in that case is how to actually fill out `parrent_id` and `trace_id`? Passing some context entity as a java object would be unfeasible. That would require too many changes in too many places. I think the only realistic way to do it, would be to have a deterministic generator of `parten_id` and `trace_id` values. For example we could create the parent trace/span of the checkpoint on JM, and set those ids to something like: `jobId#attemptId#checkpointId`. Each subtask then could re-generate those ids and subtasks' checkpoint span would have an id of `jobId#attemptId#checkpointId#subTaskId`. Note that this is just an example, as most likely distributed spans for checkpointing do not make sense, as we can generate them much easier on the JM anyway. {quote} https://lists.apache.org/thread/7lql5f5q1np68fw1wc9trq3d9l2ox8f4 At the same time: {quote} I am worried that a large amount of data aggregation to JM may have performance problems. {quote} I wouldn't worry about that too much. Those data are already aggregated on the JM from all of the TMs via {{CheckpointMetricsBuilder}} and {{CheckpointMetrics}}. Besides, it's just a single RPC from subtask -> JM per checkpoint. If that becomes a problem, we would have problems in many different areas as well (for example {{notifyCheckpointCompleted}} is a very similar call but the other direction). Also AFAIR there are/were different ideas how to solve this potential bottleneck in a more generic way (having multiple job coordinators in the cluster to spread the load). > Add metrics to monitor the interaction performance between task and external > storage system in the process of checkpoint making > ------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-33856 > URL: https://issues.apache.org/jira/browse/FLINK-33856 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.18.0 > Reporter: Jufang He > Assignee: Jufang He > Priority: Major > Labels: pull-request-available > > When Flink makes a checkpoint, the interaction performance with the external > file system has a great impact on the overall time-consuming. Therefore, it > is easy to observe the bottleneck point by adding performance indicators when > the task interacts with the external file storage system. These include: the > rate of file write , the latency to write the file, the latency to close the > file. > In flink side add the above metrics has the following advantages: convenient > statistical different task E2E time-consuming; do not need to distinguish the > type of external storage system, can be unified in the > FsCheckpointStreamFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010)