Yunfeng Zhou created FLINK-38291: ------------------------------------ Summary: Reduce thread lock overhead for Flink UI REST handlers Key: FLINK-38291 URL: https://issues.apache.org/jira/browse/FLINK-38291 Project: Flink Issue Type: Improvement Components: Runtime / REST Affects Versions: 2.1 Reporter: Yunfeng Zhou
In some of the Flink jobs in our company, we found that if the job has a sophisticated logic and the parallelism (number of subtasks) is about 512 or 1024, it may took more than one minute for the Flink UI to display the DAG of the job. Debugging into the corresponding REST handlers, we found that the latency is caused by repeated visits to synchronized methods like MetricStore# getSubtaskMetricStore. When invoking such methods, the thread might need to wait for other synchronized methods to release the lock before it can enter the method, and such overhead accumulates when the invocation is repeated. Thus we propose to reduce the number of visits to these synchronized methods to reduce the latency for DAG displaying. -- This message was sent by Atlassian Jira (v8.20.10#820010)