See: https://slurm.schedmd.com/pam_slurm_adopt.html#log_level
Try to look for logs in /var/log/secure . On Sun, Nov 10, 2024 at 9:54 AM John Hearns via slurm-users < slurm-users@lists.schedmd.com> wrote: > I have cluster which uses Slurm 23.11.6 > > When I submit a multi-node job and run something like > clush -b -w $SLURM_JOB_NODELIST "date" > very often the ssh command fails with: > Access denied by pam_slurm_adopt: you have no active jobs on this node > > This will happen maybe on 50% of the nodes > There is the same behaviour of I salloc a number of nodes then try to ssh > to a node. > > I have traced this to slurmstepd spawning a long sleep, which I believe > allows proctrackd to 'see' if a job is active. > On nodes that I can ssh into: > root 3211 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmd > --systemd > root 3227 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmstepd > infinity > root 24322 1 0 15:40 ? 00:00:00 slurmstepd: > [15709.extern] > root 24326 24322 0 15:40 ? 00:00:00 \_ sleep 100000000 > > On nodes where I cannot ssh: > root 3226 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmd > --systemd > root 3258 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmstepd > infinity > > Maybe I am not understanding something here? > > ps. I ahve tried to run the pam_slurm_adopt module with options to debug, > and have not found anything useful > > John H > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com