Hi,

> On 22 Aug 2018, at 16:27, Christian Peter 
> <christian.pe...@itwm.fraunhofer.de> wrote:
> 
> hi,
> 
> we observed a strange behavior of pam_slurm_adopt regarding the involved 
> cgroups:
> 
> when we start a shell as a new Slurm job using "srun", the process has 
> freezer, cpuset and memory cgroups setup as e.g. 
> "/slurm/uid_5001/job_410318/step_0". that's good!
> 
> however, another shell started by an SSH login is handled by pam_slurm_adopt. 
> that process is only affected by the freezer and cpuset cgroups setup as 
> "/slurm/uid_5001/job_410318/step_extern". it lacks the configuration of the 
> "memory" cgroup. (see output below)
> 
> as a consequence, all tools started from this shell prompt are not affected 
> by any memory restrictions. that's bad for our use case as we need to 
> partition the memory of our SMP machines for several independent jobs/users.
> 
> is this an expected behavior of pam_slurm_adopt/slurmstepd?
> or maybe a configuration issue? did i miss something?
> 
> a bug? to me, it looks similar to this old issue...
> https://bugs.schedmd.com/show_bug.cgi?id=2236
> 
> we're currently running Slurm 17.11.8. (we've already seen this with our 
> previous version 17.11.5.)


See https://github.com/hpcugent/slurm/pull/28 and 
https://bugs.schedmd.com/show_bug.cgi?id=5920 for a potential workaround (and 
discussion, I hope) that does not require you to disable systemd-logind. 


We now get:

vsc40075@node2801 (banette) ~> cat /proc/self/cgroup
11:hugetlb:/
10:pids:/
9:freezer:/slurm/uid_2540075/job_67119295/step_extern
8:perf_event:/
7:cpuset:/slurm/uid_2540075/job_67119295/step_extern
6:devices:/
5:net_prio,net_cls:/
4:memory:/slurm/uid_2540075/job_67119295/step_extern/task_0
3:blkio:/
2:cpuacct,cpu:/slurm/uid_2540075/job_67119295/step_extern/task_0
1:name=systemd:/user.slice/user-2540075.slice/session-9963.scope


when logging into a compute node that has a job running, using the following 
PAM sushi config lines:

account sufficient pam_slurm_adopt.so 
action_no_jobs=deny,action_adopt_failure=deny,debug,action_adopt=check_only
session sufficient pam_slurm_adopt.so 
action_no_jobs=ignore,action_adopt_failure=deny,debug


Kind regards,
— Andy

Reply via email to