On 21-04-2020 04:58, Haoyang Liu wrote:
I am setting up the latest slurm-20.02-1 on my clusters and trying to configure the 
"configless" slurm on the compute nodes.
After following the instructions from 
https://slurm.schedmd.com/configless_slurm.html, both slurmctld and slurmd 
works fine.
The config files can be found at $SlurmdSpoolDir/conf-cache and 
/run/slurm/conf. However, when I try to ssh into some compute
node, say `comput6`,

$ ssh comput6

the prompt will be stuck for ~one minute and finally returns 'No Slurm jobs 
found on node'. Previously it should be
'Access denied by pam_slurm_adopt: you have no active jobs on this node'.

The issue can be reproduced on centos 6 and 7. I've checked /var/log/secure and 
noticed the following output:

comput6 pam_slurm_adopt[43672]: error: s_p_parse_file: unable to status file 
/usr/local/slurm/etc/slurm.conf: No such file or directory, retrying in 1sec up 
to 60sec

It seems that pam_slurm_adopt is still trying to find the config file in the default 
directory under the "configless" mode.
Creating a symlink in /usr/local/slurm/etc seems to be a workaround, but it seems moving 
away from the "configless" slurm.

Is there a better way to fix this?

This issue has been reported previously by others, and there is a recent bug report https://bugs.schedmd.com/show_bug.cgi?id=8712 which you could follow for updates.

Probably the issue needs to be reported by a customer with a SchedMD support contract before a solution can be expected.

/Ole

Reply via email to