[slurm-users] Re: sbatch and cgroup v2

Hermann Schwärzler via slurm-users Wed, 28 Feb 2024 06:02:06 -0800

Hi Dietmar,

what do you find in the output-file of this job


sbatch --time 5 --cpus-per-task=1 --wrap 'grep Cpus /proc/$$/status'

On our 64 cores machines with enabled hyperthreading I see e.g.

Cpus_allowed:   04000000,00000000,04000000,00000000
Cpus_allowed_list:      58,122

Greetings
Hermann


On 2/28/24 14:28, Dietmar Rieder via slurm-users wrote:

Hi,

I'm new to slrum, but maybe someone can help me:
I'm trying to restrict the CPU usage to the actually requested/allocatedresources using cgroup v2.
For this I made the following settings in slurmd.conf:


ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup,task/affinity

And in cgroup.conf

CgroupPlugin=cgroup/v2
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainDevices=yes
AllowedRAMSpace=98


cgroup v2 seems to be active on the compute node:

# mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2(rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
# cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu io memory pids
# cat /sys/fs/cgroup/system.slice/cgroup.subtree_control
cpuset cpu io memory pids
Now, when I use sbatch to submit the following test script, the pythonscript which is started from the batch script is utilizing all CPUs (96)at 100% on the allocated node, although I only ask for 4 cpus(--cpus-per-task=4). I'd expect that the task can not use more thatthese 4.
#!/bin/bash
#SBATCH --output=/local/users/appadmin/test-%j.log
#SBATCH --job-name=test
#SBATCH --chdir=/local/users/appadmin
#SBATCH --cpus-per-task=4
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --mem=64gb
#SBATCH --time=4:00:00
#SBATCH --partition=standard
#SBATCH --gpus=0
#SBATCH --export
#SBATCH --get-user-env=L
exportPATH=/usr/local/bioinf/jupyterhub/bin:/usr/local/bioinf/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bioinf/miniforge/condabin
source .bashrc
conda activate test
python test.py
The python code in test.py is the following using the cpu_load_generatorpackage from [1]:
#!/usr/bin/env python

import sys
from cpu_load_generator import load_single_core, load_all_cores,from_profile
load_all_cores(duration_s=120, target_load=1) # generates load on allcores
Interestingly, when I use srun to launch an interactive job, and run thepython script manually, I see with top that only 4 cpus are running at100%. And I also python errors thrown when the script tries to start the5th process (which makes sense):
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
     self.run()
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 108, in run
     self._target(*self._args, **self._kwargs)
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/cpu_load_generator/_interface.py", line 24, in load_single_core
     process.cpu_affinity([core_num])
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/__init__.py", line 867, in cpu_affinity
     self._proc.cpu_affinity_set(list(set(cpus)))
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 1714, in wrapper
     return fun(self, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
File"/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 2213, in cpu_affinity_set
     cext.proc_cpu_affinity_set(self.pid, cpus)
OSError: [Errno 22] Invalid argument
What am I missing, why are the CPU resources not restricted when I usesbatch?
Thanks for any input or hint
    Dietmar

[1]: https://pypi.org/project/cpu-load-generator/


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: sbatch and cgroup v2

Reply via email to