Hi, > On 22 Aug 2018, at 16:27, Christian Peter > <christian.pe...@itwm.fraunhofer.de> wrote: > > hi, > > we observed a strange behavior of pam_slurm_adopt regarding the involved > cgroups: > > when we start a shell as a new Slurm job using "srun", the process has > freezer, cpuset and memory cgroups setup as e.g. > "/slurm/uid_5001/job_410318/step_0". that's good! > > however, another shell started by an SSH login is handled by pam_slurm_adopt. > that process is only affected by the freezer and cpuset cgroups setup as > "/slurm/uid_5001/job_410318/step_extern". it lacks the configuration of the > "memory" cgroup. (see output below) > > as a consequence, all tools started from this shell prompt are not affected > by any memory restrictions. that's bad for our use case as we need to > partition the memory of our SMP machines for several independent jobs/users. > > is this an expected behavior of pam_slurm_adopt/slurmstepd? > or maybe a configuration issue? did i miss something? > > a bug? to me, it looks similar to this old issue... > https://bugs.schedmd.com/show_bug.cgi?id=2236 > > we're currently running Slurm 17.11.8. (we've already seen this with our > previous version 17.11.5.)
See https://github.com/hpcugent/slurm/pull/28 and https://bugs.schedmd.com/show_bug.cgi?id=5920 for a potential workaround (and discussion, I hope) that does not require you to disable systemd-logind. We now get: vsc40075@node2801 (banette) ~> cat /proc/self/cgroup 11:hugetlb:/ 10:pids:/ 9:freezer:/slurm/uid_2540075/job_67119295/step_extern 8:perf_event:/ 7:cpuset:/slurm/uid_2540075/job_67119295/step_extern 6:devices:/ 5:net_prio,net_cls:/ 4:memory:/slurm/uid_2540075/job_67119295/step_extern/task_0 3:blkio:/ 2:cpuacct,cpu:/slurm/uid_2540075/job_67119295/step_extern/task_0 1:name=systemd:/user.slice/user-2540075.slice/session-9963.scope when logging into a compute node that has a job running, using the following PAM sushi config lines: account sufficient pam_slurm_adopt.so action_no_jobs=deny,action_adopt_failure=deny,debug,action_adopt=check_only session sufficient pam_slurm_adopt.so action_no_jobs=ignore,action_adopt_failure=deny,debug Kind regards, — Andy