[ https://issues.apache.org/jira/browse/SPARK-49783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-49783: ---------------------------------- Parent: SPARK-44111 Issue Type: Sub-task (was: Bug) > Resource leak of spark yarn allocator > ------------------------------------- > > Key: SPARK-49783 > URL: https://issues.apache.org/jira/browse/SPARK-49783 > Project: Spark > Issue Type: Sub-task > Components: YARN > Affects Versions: 3.5.0, 4.0.0 > Reporter: Junfan Zhang > Assignee: Junfan Zhang > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the target < running containers number, the assigned containers from the > resource manager will be skipped, but these containers are not released by > invoking the amClient.releaseAssignedContainer , that will make these > containers reserved into the Yarn resourceManager at least 10 minutes. And > so, the cluster resource will be wasted at a high ratio. > And this will reflect that the vcore * seconds statistics from yarn side will > be greater than the result from the spark event logs. > From my statistics, the cluster resource waste ratio is ~25% if the spark > jobs are exclusive in this cluster. > > The more details could be found in this blog: > [https://zuston.vercel.app/publish/resource-leak-of-spark-yarn-allocator/] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org