Dear all, I have configured pam_slurm_adopt in our Slurm test environment by following the corresponding documentation:
https://slurm.schedmd.com/pam_slurm_adopt.html I've set `PrologFlags=contain´ in slurm.conf and also have task/cgroup enabled along with task/affinity (i.e. `TaskPlugin=task/affinity,task/cgroup´). This is the current configuration in cgroups.conf: CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainKmemSpace=no TaskAffinity=yes PAM is enabled in /etc/ssh/sshd_config, i.e. `UsePAM yes´, which is the default on RHEL7 anyway. SELinux is disabled on the system. PAM configuration in /etc/pam.d/sshd (last two lines only): [...] # Authorize users that have a running job on node account sufficient pam_slurm_adopt.so account required pam_access.so nodefgroup This does all work fine. Users can only log into a compute node if they have at least one job running on it. Access is denied by pam_slurm_adopt for all other users. When the user logs into a compute node, the process environment is indeed "somewhat" adopted into the external step of one of the running jobs. However *only* in terms of the cpuset: $ sbatch --mem=2G job.slurm Submitted batch job 357 $ scontrol show job 357 | grep BatchHost BatchHost=n1521 $ ssh n1521 Last login: Fri Jul 12 14:43:19 2019 from XXXX $ cat /proc/self/cgroup 11:cpuset:/slurm/uid_900002/job_357/step_extern 10:hugetlb:/ 9:perf_event:/ 8:devices:/user.slice 7:net_prio,net_cls:/ 6:cpuacct,cpu:/user.slice 5:pids:/user.slice 4:blkio:/user.slice 3:memory:/user.slice 2:freezer:/slurm/uid_900002/job_357/step_extern 1:name=systemd:/user.slice/user-900002.slice/session-35494.scope $ Thus, the ssh session seems to be totally unconstrained by cgroups in terms of memory usage. In fact, I was able to launch a test application from the interactive ssh session that consumed almost all of the memory on that node. That's obviously undesirable for a shared user environment with jobs from different users running side by side on one node at the same time. I suppose this is nevertheless the expected behavior and just the way it is when using pam_slurm_adopt to restrict access to the compute nodes? Is that right? Or did I miss something obvious? Thank you in advance for any comment. Best regards Jürgen Salk PS: This is Slurm version 18.08.7 if that matters. -- Jürgen Salk Scientific Software & Compute Services (SSCS) Kommunikations- und Informationszentrum (kiz) Universität Ulm Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471