tluchko writes:
> Jobs only sit in the queue with RESOURCES as the REASON when we
> include the flag --gres=bandwidth:ib. If we remove the flag, the jobs
> run fine. But we need the flag to ensure that we don't get a mix of IB
> and ethernet nodes because they fail in this case.
This doesn't ans
Jake, my hunch is that your jobs are getting hung up on mem allocation,
such that Slurm is assigning all of memory to each job as it runs; you can
verify w/scontrol show job. If that's what's happening, try setting a
DefMemPerCPU value for your partition(s).
Best of luck,
Lyn
On Thu, May 26, 2022
Ah, thank you. I was assuming it would use the same name as in scontrol
Per ‘man slurm.conf’ :
Feature:
A comma delimited list of arbitrary strings indicative of some
characteristic associated with the node. There is no value associated with a
feature at this time, a node either has a featu
In slurm.conf, we just add the Features to the node description. Is that what
you were looking for?
NodeName=compute-4-4 ... Weight=15 Feature=gen10
Jeff
UH IT - HPC
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of
Hanby, Mike
Sent: Thursday, June 2, 2022 2:06 PM
Add it to your slurm.conf
Then it is always there after a restart.
Brian Andrus
On 6/2/2022 12:05 PM, Hanby, Mike wrote:
Howdy,
I can’t seem to find a solution in ‘man slurm.conf’ for this. How can
I make the following persist a slurmctld restart:
scontrol update NodeName="c001" Available
Howdy,
I can’t seem to find a solution in ‘man slurm.conf’ for this. How can I make
the following persist a slurmctld restart:
scontrol update NodeName="c001" AvailableFeatures=hi_mem,data,scratch
NodeName=c001 Arch=x86_64 CoresPerSocket=12
CPUAlloc=2 CPUTot=48 CPULoad=6.08
AvailableFeatu
Hi Jeff & list,
we've encountered the same problem after upgrade to 21.08.8-2. All jobs failed
with "Slurmd could not execve job".
I've traced this down to the slurmstepd process failing to modify the cgroup
setting "memory.memsw.limit_in_bytes",
which happens because we have "ConstrainSwapSpac
Hi all,
I'm trying to use heterogeneous jobs with the following slurm script:
#!/usr/bin/env bash
#SBATCH --partition=cpu --time=01:00:00 --nodes=2 --ntasks-per-node=1
--cpus-per-task=2 --mem=8G
#SBATCH hetjob
#SBATCH --partition=gpu --time=01:00:00 --nodes=2 --ntasks-per-node=1
--cpus-per-