[ https://issues.apache.org/jira/browse/SPARK-49783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-49783. ----------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 48238 [https://github.com/apache/spark/pull/48238] > Resource leak of spark yarn allocator > ------------------------------------- > > Key: SPARK-49783 > URL: https://issues.apache.org/jira/browse/SPARK-49783 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 3.5.0, 4.0.0 > Reporter: Junfan Zhang > Assignee: Junfan Zhang > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the target < running containers number, the assigned containers from the > resource manager will be skipped, but these containers are not released by > invoking the amClient.releaseAssignedContainer , that will make these > containers reserved into the Yarn resourceManager at least 10 minutes. And > so, the cluster resource will be wasted at a high ratio. > And this will reflect that the vcore * seconds statistics from yarn side will > be greater than the result from the spark event logs. > From my statistics, the cluster resource waste ratio is ~25% if the spark > jobs are exclusive in this cluster. > > The more details could be found in this blog: > [https://zuston.vercel.app/publish/resource-leak-of-spark-yarn-allocator/] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org