See:
https://slurm.schedmd.com/pam_slurm_adopt.html#log_level

Try to look for logs in /var/log/secure .


On Sun, Nov 10, 2024 at 9:54 AM John Hearns via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> I have cluster which uses Slurm 23.11.6
>
> When I submit a multi-node job and run something like
> clush -b -w $SLURM_JOB_NODELIST "date"
> very often the ssh command fails with:
>  Access denied by pam_slurm_adopt: you have no active jobs on this node
>
> This will happen maybe on 50% of the nodes
> There is the same behaviour of I salloc a number of nodes then try to ssh
> to a node.
>
> I have traced this to slurmstepd spawning a long sleep, which I believe
> allows proctrackd to 'see' if a job is active.
> On nodes that I can ssh into:
> root        3211       1  0 Nov08 ?        00:00:00 /usr/sbin/slurmd
> --systemd
> root        3227       1  0 Nov08 ?        00:00:00 /usr/sbin/slurmstepd
> infinity
> root       24322       1  0 15:40 ?        00:00:00 slurmstepd:
> [15709.extern]
> root       24326   24322  0 15:40 ?        00:00:00  \_ sleep 100000000
>
> On nodes where I cannot ssh:
> root        3226       1  0 Nov08 ?        00:00:00 /usr/sbin/slurmd
> --systemd
> root        3258       1  0 Nov08 ?        00:00:00 /usr/sbin/slurmstepd
> infinity
>
> Maybe I am not understanding something here?
>
> ps. I ahve tried to run the pam_slurm_adopt module with options to debug,
> and have not found anything useful
>
> John H
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to