Yunfeng Zhou created FLINK-38291:
------------------------------------

             Summary: Reduce thread lock overhead for Flink UI REST handlers
                 Key: FLINK-38291
                 URL: https://issues.apache.org/jira/browse/FLINK-38291
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / REST
    Affects Versions: 2.1
            Reporter: Yunfeng Zhou


In some of the Flink jobs in our company, we found that if the job has a 
sophisticated logic and the parallelism (number of subtasks) is about 512 or 
1024, it may took more than one minute for the Flink UI to display the DAG of 
the job.

Debugging into the corresponding REST handlers, we found that the latency is 
caused by repeated visits to synchronized methods like MetricStore#
getSubtaskMetricStore. When invoking such methods, the thread might need to 
wait for other synchronized methods to release the lock before it can enter the 
method, and such overhead accumulates when the invocation is repeated.
Thus we propose to reduce the number of visits to these synchronized methods to 
reduce the latency for DAG displaying.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to