It turns out that the Slurm job limits are *not* controlled by the normal
/etc/security/limits.conf configuration. Any service running under
Systemd (such as slurmd) has limits defined by Systemd, see [1] and [2].
The limits of processes started by slurmd are defined by LimitXXX in
/usr/lib/s
Good afternoon,
I'm working on a cluster of NVIDIA DGX A100's that is using BCM 10 (Base
Command Manager which is based on Bright Cluster Manager). I ran into an
error and only just learned that Slurm and Weka don't get along (presumably
because Weka pins their client threads to cores). I read thr
This is because you have no slurm.conf in /etc/slurm, so it it is trying
'configless' which queries DNS to find out where to get the config. It
is failing because you do not have DNS configured to tell nodes where to
ask about the config.
Simple solution: put a copy of slurm.conf in /etc/slurm
>
> Simple solution: put a copy of slurm.conf in /etc/slurm/ on the node(s).
>
For Bright slurm.conf is in /cm/shared/apps/slurm/var/etc/slurm including
on all nodes. Make sure on the compute nodes $SLURM_CONF resolves to the
correct path.
> On 4/19/2024 9:56 AM, Jeffrey Layton via slurm-users w
I like it, however, it was working before without a slurm.conf in
/etc/slurm.
Plus the environment variable SLURM_CONF is pointing to the correct
slurm.conf file (the one in /cm/...). Wouldn't Slurm pick up that one?
Thanks!
Jeff
On Fri, Apr 19, 2024 at 1:11 PM Brian Andrus via slurm-users <
s
I would double-check where you are setting SLURM_CONF then. It is acting
as if it is not set (typo maybe?)
It should be in /etc/defaults/slurmd (but could be /etc/sysconfig/slurmd).
Also check what the final, actual command being run to start it is. If
anyone has changed the .service file or a
On Bright it's set in a few places:
grep -r -i SLURM_CONF /etc
/etc/systemd/system/slurmctld.service.d/99-cmd.conf:Environment=SLURM_CONF=/cm/shared/apps/slurm/var/etc/slurm/slurm.conf
/etc/systemd/system/slurmdbd.service.d/99-cmd.conf:Environment=SLURM_CONF=/cm/shared/apps/slurm/var/etc/slurm/slur
We use Bright Cluster Manager with SLurm 23.02 on RHEL9. I know about
pam_slurm_adopt https://slurm.schedmd.com/pam_slurm_adopt.html which does
not appear to come by default with the Bright 'cm' package of Slurm.
Currently ssh to a node gets:
Login not allowed: no running jobs and no WLM allocatio