[slurm-users] pam_slurm_adopt does not constrain memory?

Christian Peter Wed, 22 Aug 2018 07:30:06 -0700

hi,

we observed a strange behavior of pam_slurm_adopt regarding theinvolved cgroups:

when we start a shell as a new Slurm job using "srun", the process hasfreezer, cpuset and memory cgroups setup as e.g."/slurm/uid_5001/job_410318/step_0". that's good!

however, another shell started by an SSH login is handled bypam_slurm_adopt. that process is only affected by the freezer andcpuset cgroups setup as "/slurm/uid_5001/job_410318/step_extern". itlacks the configuration of the "memory" cgroup. (see output below)

as a consequence, all tools started from this shell prompt are notaffected by any memory restrictions. that's bad for our use case as weneed to partition the memory of our SMP machines for severalindependent jobs/users.


is this an expected behavior of pam_slurm_adopt/slurmstepd?
or maybe a configuration issue? did i miss something?

a bug? to me, it looks similar to this old issue...
https://bugs.schedmd.com/show_bug.cgi?id=2236

we're currently running Slurm 17.11.8. (we've already seen this withour previous version 17.11.5.)


thanks for your help and suggestions!

  christian



--------------------------------------

== cgroups within srun ==

login$ srun --pty bash

node064$ cat /proc/self/cgroup

11:pids:/system.slice/slurmd.service
10:freezer:/slurm/uid_501/job_410318/step_0
9:cpuset:/slurm/uid_501/job_410318/step_0
8:cpuacct,cpu:/system.slice/slurmd.service
7:net_prio,net_cls:/
6:blkio:/system.slice/slurmd.service
5:perf_event:/
4:devices:/system.slice/slurmd.service
3:memory:/slurm/uid_501/job_410318/step_0
2:hugetlb:/
1:name=systemd:/system.slice/slurmd.service

== cgroups for external step ==

login$ ssh node064

node064$ cat /proc/self/cgroup

11:pids:/user.slice
10:freezer:/slurm/uid_501/job_410318/step_extern
9:cpuset:/slurm/uid_501/job_410318/step_extern
8:cpuacct,cpu:/user.slice
7:net_prio,net_cls:/
6:blkio:/user.slice
5:perf_event:/
4:devices:/user.slice
3:memory:/user.slice
2:hugetlb:/
1:name=systemd:/user.slice/user-501.slice/session-430.scope

<<attachment: christian_peter.vcf>>

[slurm-users] pam_slurm_adopt does not constrain memory?

Reply via email to