[ 
https://issues.apache.org/jira/browse/YARN-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Guet resolved YARN-11572.
---------------------------------------
    Resolution: Fixed

In fact caused by a third party service.

> hadoop-yarn cgroup directory is deleted after each "systemctl daemon-reload" 
> command
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-11572
>                 URL: https://issues.apache.org/jira/browse/YARN-11572
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.3.4
>         Environment:  
>  
>            Reporter: Jean-Baptiste Guet
>            Priority: Major
>
> I have an Hadoop cluster and I need to activate cgroups in order to use GPU 
> in docker environment. I followed the documentation for the setup.
>  
> {*}To summarize{*}: I do manage myself the cgroups creation (cpu, cpuacct and 
> devices), which results as expected on the creation of 3 directories in 
> {_}{{/sys/fs/cgroup/}}{_}. However, upon each {_}systemctl daemon-reload{_}, 
> the _/sys/fs/cgroup-hadoop-yarn_ directory is systematically deleted, which 
> prevents Yarn's nodemanager from working.
>  
> {*}In details{*}:
> As it's written in the documentation, I kept the parameter 
> _yarn.nodemanager.linux-container-executor.cgroups.mount_ to _false_ in order 
> manage the cgroup myself (security reason).
> As I'm on CentOS 8, I use cgroup v1. I defined the parameters :
>  * _yarn.nodemanager.linux-container-executor.cgroups.hierarchy_ to 
> _/hadoop-yarn_
>  * _yarn.nodemanager.linux-container-executor.cgroups.mount-path_ to 
> _/sys/fs/cgroup_
> Yarn needs 3 cgroups : cpu, cpuacct and devices.
> In order to have the /haddop-yarn persistent, I've install libcgroup rpm then 
> I've updated /etc/cgconfig.conf with
> {code:java}
> group hadoop-yarn {
>      perm {
>          admin {
>              uid = yarn;
>              gid = hadoop;
>          }
>          task {
>              uid = yarn;
>              gid = hadoop;
>          }
>      }
>      cpu {}
>      cpuacct {}
>      devices {}
>  }
> {code}
> and I've started cgconfig service. The 3 directories are created :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/ }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:27 /sys/fs/cgroup/devices/hadoop-yarn/ 
> }}
> {code}
>  
> At this point, I can restart the Yarn NodeManager.
> However, each time that someone execute {{{}systemctl daemon-reload{}}}, the 
> devices directory is deleted :
> {code:java}
> $ ll /sys/fs/cgroup/{cpu,cpuacct,devices}/hadoop-yarn/ -d
> {{ls: cannot access '/sys/fs/cgroup/devices/hadoop-yarn/': No such file or 
> directory }}
> {{drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpuacct/hadoop-yarn/ 
> }}
> drwxr-xr-x 2 yarn hadoop 0 Sep 8 13:15 /sys/fs/cgroup/cpu/hadoop-yarn/{code}
>  
> I see nothing in logs, I have no idea why this directory is deleted. And of 
> course, Yarn NodeManager needs this directory, so the NodeManager doesn't 
> work anymore and needs to be restarted (once the directory has been 
> re-created of course).
> As an other solution of cgconfig service, I've tested to create my own 
> service that will create these directories.
> {code:java}
> vim /etc/systemd/system/hadoop-yarn-cgroup.service
> [Unit]
> Description=Custom cgroup for Hadoop YARN
> [Service]
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpu/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/cpuacct/hadoop-yarn
> ExecStartPre=/bin/mkdir -p /sys/fs/cgroup/devices/hadoop-yarn
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpu/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/cpuacct/hadoop-yarn/
> ExecStartPre=/usr/bin/chown -R yarn:hadoop /sys/fs/cgroup/devices/hadoop-yarn/
> ExecStart=/bin/true
> Slice=hadoop-yarn.slice
> MemoryAccounting=yes
> MemoryLimit=1G
> [Install]
> WantedBy=multi-user.target{code}
>  
> The behaviour is the same :
>  * directories are created
>  * systemctl daemon-reload
>  * devices/hadoop-yarn directory is deleted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to