Hello,
On 2/6/25 9:50 PM, Chase Schuette via slurm-users wrote:
After a Bright Computing/Base Command update. I'm encountering a
slurm.conf error as seen below. I tried removing the MemSpecLimit
parameter from the node names but the changes don't seem to be taking
affect. Even after restarting slurmd across compute nodes and restarting
slurmctld on the head node. I'm also suspicious of the RealMemory being
set to zero.
Any insight? Open to suggestions. Thanks ahead of time!
Maybe you can take this to the Nvidia forums to find better help there,
but typically, with BCM, the slurm.conf is partially auto-generated,
including nodes and partitions (be mindful of any comment blocks that
might be mentioning this).
The way you'd change those in a typical BCM+Slurm deployment is by using
overlays. Here's an example cmsh path:
home;configurationoverlay;use "slurm-client";roles;use slurmclient;
From there, you should be able to show/get/set/clear the appropriate
values. Just remember to commit your changes.
Best,
--
Roberto Polverelli Monti
HPC System Engineer
Do IT Now | https://doitnowgroup.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com