On 21-04-2020 04:58, Haoyang Liu wrote:
I am setting up the latest slurm-20.02-1 on my clusters and trying to configure the
"configless" slurm on the compute nodes.
After following the instructions from
https://slurm.schedmd.com/configless_slurm.html, both slurmctld and slurmd
works fine.
The config files can be found at $SlurmdSpoolDir/conf-cache and
/run/slurm/conf. However, when I try to ssh into some compute
node, say `comput6`,
$ ssh comput6
the prompt will be stuck for ~one minute and finally returns 'No Slurm jobs
found on node'. Previously it should be
'Access denied by pam_slurm_adopt: you have no active jobs on this node'.
The issue can be reproduced on centos 6 and 7. I've checked /var/log/secure and
noticed the following output:
comput6 pam_slurm_adopt[43672]: error: s_p_parse_file: unable to status file
/usr/local/slurm/etc/slurm.conf: No such file or directory, retrying in 1sec up
to 60sec
It seems that pam_slurm_adopt is still trying to find the config file in the default
directory under the "configless" mode.
Creating a symlink in /usr/local/slurm/etc seems to be a workaround, but it seems moving
away from the "configless" slurm.
Is there a better way to fix this?
This issue has been reported previously by others, and there is a recent
bug report https://bugs.schedmd.com/show_bug.cgi?id=8712 which you could
follow for updates.
Probably the issue needs to be reported by a customer with a SchedMD
support contract before a solution can be expected.
/Ole