Solution: `UsePAM=1` in the slurm.conf, and `ln -s /etc/pam.d/sshd /etc/pam.d/slurm`
The documentation of UsePAM in https://slurm.schedmd.com/slurm.conf.html is actually quite clear - when googling, I somehow I was just confused by the various references to pam_slurm / pam_slurm_adopt On Tue, 10 Oct 2023 at 22:56, Leopold Talirz <leopold.tal...@gmail.com> wrote: > Hi, > > I have an issue with SLURM (20.11.9) in conjunction with LDAP user > accounts. > > Both the scheduler node, where slurmctld is running, and the worker nodes > that are spun up by slurm are running the SSSD, which fetches user accounts > from an external LDAP server. > > This works fine: I can log into the scheduler _and_ the worker nodes using > SSH as an LDAP user without problems. > This does not work: If, instead of SSH, I connect to a worker node via a > slurm job, i.e. using `srun` (or `sbatch`), I get > > whoami: cannot find name for user ID 1290486416 > > It seems that, for some reason, SLURM does not rely on the same > authentication mechanism (configured via /etc/pam.d/*) as SSH. > > Any ideas what may be causing this or which logs I should be looking at to > understand what is going on here? > > Potentially relevant further information: > - The scheduler is running CentOS 7.9 (meaning /etc/pam.d is configured > via the older authconfig), while the worker nodes are running AlmaLinux 8.7 > (meaning /etc/pam.d is configured via the newer authselect). As described > above, both work fine when connecting via SSH, but I don't know whether > slurm imposes additional requirements between the scheduler VM and the > workers. > - After I log in via SSH to one of the worker nodes for the first time, > `srun` then also starts working (it recognizes the user account, apparently > it is now seeing it in some cache). However, there are still differences > between the user state when logging via SSH and via srun - for example, > when using `srun` the user account does not have access to /dev/nvidia* > devices, i.e. nvidia-smi shows "no devices found", while logging in via SSH > shows the devices correctly. > > Best wishes, > Leopold >