[
https://issues.apache.org/jira/browse/YARN-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Szucs resolved YARN-11733.
--------------------------------
Fix Version/s: 3.5.0
Resolution: Fixed
> Fix the order of updating CPU controls with cgroup v1
> -----------------------------------------------------
>
> Key: YARN-11733
> URL: https://issues.apache.org/jira/browse/YARN-11733
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Peter Szucs
> Assignee: Peter Szucs
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2
> support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us
> controls have changed which can cause the below errors when launching
> containers with CPU limits on cgroupv1:
> {code:java}
> PrintWriter unable to write to
> /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
> with value: 112500{code}
>
> *Reproduction:*
> I set CPU limits on yarn-site.xml for cgroup:
> {code:java}
> yarn.nodemanager.resource.percentage-physical-cpu-limit: 90
> yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage:
> true{code}
> After that the limits were applied on the hadoop-yarn root hierarchy:
> {code:java}
> root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000
> root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000
> {code}
> When I tried to launch a container it gave me the following error:
> {code:java}
> PrintWriter unable to write to
> /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us
> with value: 112500{code}
> It is because the container tries to exceed the limit defined at higher level
> with the 112 500 value for cfs_quota_us. If I try to create a test cgroup
> manually and try to update this control it lets me to do that up to the value
> of 90 000 as well:
> {code:java}
> [root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us
> 100000
> [root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us
> -bash: echo: write error: Invalid argument
> [root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us{code}
>
> *Solution:*
> The cause for this issue is that the cfs_period_us control get the default
> value of 100 000 when a new cgroup is created, but when YARN calculates the
> limit, it uses 1 000 000 for that. Because of this we need to update
> cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the two
> values and not to overcome the limit defined at parent level.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]