Jun Gong created YARN-11714:
-------------------------------
Summary: Add cache for createAndGetApplicationReport to improve
perfocemance
Key: YARN-11714
URL: https://issues.apache.org/jira/browse/YARN-11714
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 3.3.6
Reporter: Jun Gong
In our cluster, which consists of 2000+ nodes, 2000-8000 running applications,
and 10,000 completed applications, it takes approximately 1 to 10 seconds to
obtain the application list using YarnClient.getApplications(). Additionally,
the ResourceManager (RM) event size often exceeds 100,000.
Upon further investigation, I discovered that the createAndGetApplicationReport
function consumes a significant amount of time, as it requires obtaining
several critical locks, such as the RMApp lock, RMAppAttempt lock, and
scheduler lock. This consequently reduces scheduler performance and slows down
event handling.
To enhance performance, I propose implementing a cache for storing the app
reports of applications with a finished status (SUCCEEDED/FAILED/KILLED). Since
the status of these applications will not change, caching their reports should
be a viable solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]