[
https://issues.apache.org/jira/browse/YARN-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Baptiste Guet resolved YARN-11572.
---------------------------------------
Resolution: Fixed
In fact caused by a third party service.
> hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload"
> command
> ------------------------------------------------------------------------------------
>
> Key: YARN-11572
> URL: https://issues.apache.org/jira/browse/YARN-11572
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.3.4
> Environment:
>
> Reporter: Jean-Baptiste Guet
> Priority: Major
>
> I have an Hadoop cluster and I need to activate cgroups in order to use GPU
> in docker environment. I followed the documentation for the setup.
>
> {*}To summarize{*}: I do manage myself the cgroups creation (cpu, cpuacct and
> devices), which results as expected on the creation of 3 directories in
> {_}{{/sys/fs/cgroup/}}{_}. However, upon each {_}systemctl daemon-reload{_},
> the _/sys/fs/cgroup-hadoop-yarn_ directory is systematically deleted, which
> prevents Yarn's nodemanager from working.
>
> {*}In details{*}:
> As it's written in the documentation, I kept the parameter
> _yarn.nodemanager.linux-container-executor.cgroups.mount_ to _false_ in order
> manage the cgroup myself (security reason).
> As I'm on CentOS 8, I use cgroup v1. I defined the parameters :
> * _yarn.nodemanager.linux-container-executor.cgroups.hierarchy_ to
> _/hadoop-yarn_
> * _yarn.nodemanager.linux-container-executor.cgroups.mount-path_ to
> _/sys/fs/cgroup_
> Yarn needs 3 cgroups : cpu, cpuacct and devices.
> In order to have the /haddop-yarn persistent, I've install libcgroup rpm then
> I've updated /etc/cgconfig.conf with
> {code:java}
> group hadoop-yarn {
> perm {
> admin {
> uid = yarn;
> gid = hadoop;
> }
> task {
> uid = yarn;
> gid = hadoop;
> }
> }
> cpu {}
> cpuacct {}
> devices {}
> }
> {code}
> and I've started cgconfig service. The 3 directories are created :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/
> }}
> {code}
>
> At this point, I can restart the Yarn NodeManager.
> However, each time that someone execute {{{}systemctl daemon-reload{}}}, the
> devices directory is deleted :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> {{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or
> directory }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
> }}
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/{code}
>
> I see nothing in logs, I have no idea why this directory is deleted. And of
> course, Yarn NodeManager needs this directory, so the NodeManager doesn't
> work anymore and needs to be restarted (once the directory has been
> re-created of course).
> As an other solution of cgconfig service, I've tested to create my own
> service that will create these directories.
> {code:java}
> vim /etc/systemd/system/hadoop-yarn-cgroup.service
> [Unit]
> Description=Custom cgroup for Hadoop YARN
> [Service]
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
> ExecStart=/bin/true
> Slice=hadoop-yarn.slice
> MemoryAccounting=yes
> MemoryLimit=1G
> [Install]
> WantedBy=multi-user.target{code}
>
> The behaviour is the same :
> * directories are created
> * systemctl daemon-reload
> * devices/hadoop-yarn directory is deleted
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]