hi,we observed a strange behavior of pam_slurm_adopt regarding the involved cgroups:
when we start a shell as a new Slurm job using "srun", the process has freezer, cpuset and memory cgroups setup as e.g. "/slurm/uid_5001/job_410318/step_0". that's good!
however, another shell started by an SSH login is handled by pam_slurm_adopt. that process is only affected by the freezer and cpuset cgroups setup as "/slurm/uid_5001/job_410318/step_extern". it lacks the configuration of the "memory" cgroup. (see output below)
as a consequence, all tools started from this shell prompt are not affected by any memory restrictions. that's bad for our use case as we need to partition the memory of our SMP machines for several independent jobs/users.
is this an expected behavior of pam_slurm_adopt/slurmstepd? or maybe a configuration issue? did i miss something? a bug? to me, it looks similar to this old issue... https://bugs.schedmd.com/show_bug.cgi?id=2236we're currently running Slurm 17.11.8. (we've already seen this with our previous version 17.11.5.)
thanks for your help and suggestions! christian -------------------------------------- == cgroups within srun == login$ srun --pty bash node064$ cat /proc/self/cgroup 11:pids:/system.slice/slurmd.service 10:freezer:/slurm/uid_501/job_410318/step_0 9:cpuset:/slurm/uid_501/job_410318/step_0 8:cpuacct,cpu:/system.slice/slurmd.service 7:net_prio,net_cls:/ 6:blkio:/system.slice/slurmd.service 5:perf_event:/ 4:devices:/system.slice/slurmd.service 3:memory:/slurm/uid_501/job_410318/step_0 2:hugetlb:/ 1:name=systemd:/system.slice/slurmd.service == cgroups for external step == login$ ssh node064 node064$ cat /proc/self/cgroup 11:pids:/user.slice 10:freezer:/slurm/uid_501/job_410318/step_extern 9:cpuset:/slurm/uid_501/job_410318/step_extern 8:cpuacct,cpu:/user.slice 7:net_prio,net_cls:/ 6:blkio:/user.slice 5:perf_event:/ 4:devices:/user.slice 3:memory:/user.slice 2:hugetlb:/ 1:name=systemd:/user.slice/user-501.slice/session-430.scope
<<attachment: christian_peter.vcf>>