Re: [slurm-users] Need to restart slurmctld for gres jobs to start

2022-06-02 Thread Bjørn-Helge Mevik
tluchko writes: > Jobs only sit in the queue with RESOURCES as the REASON when we > include the flag --gres=bandwidth:ib. If we remove the flag, the jobs > run fine. But we need the flag to ensure that we don't get a mix of IB > and ethernet nodes because they fail in this case. This doesn't ans

Re: [slurm-users] New slurm configuration - multiple jobs per host

2022-06-02 Thread Lyn Gerner
Jake, my hunch is that your jobs are getting hung up on mem allocation, such that Slurm is assigning all of memory to each job as it runs; you can verify w/scontrol show job. If that's what's happening, try setting a DefMemPerCPU value for your partition(s). Best of luck, Lyn On Thu, May 26, 2022

Re: [slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Hanby, Mike
Ah, thank you. I was assuming it would use the same name as in scontrol Per ‘man slurm.conf’ : Feature: A comma delimited list of arbitrary strings indicative of some characteristic associated with the node. There is no value associated with a feature at this time, a node either has a featu

Re: [slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Sarlo, Jeffrey S
In slurm.conf, we just add the Features to the node description. Is that what you were looking for? NodeName=compute-4-4 ... Weight=15 Feature=gen10 Jeff UH IT - HPC From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Hanby, Mike Sent: Thursday, June 2, 2022 2:06 PM

Re: [slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Brian Andrus
Add it to your slurm.conf Then it is always there after a restart. Brian Andrus On 6/2/2022 12:05 PM, Hanby, Mike wrote: Howdy, I can’t seem to find a solution in ‘man slurm.conf’ for this. How can I make the following persist a slurmctld restart: scontrol update NodeName="c001" Available

[slurm-users] How to Make AvailableFeatures Persist after Slurmctld Restart

2022-06-02 Thread Hanby, Mike
Howdy, I can’t seem to find a solution in ‘man slurm.conf’ for this. How can I make the following persist a slurmctld restart: scontrol update NodeName="c001" AvailableFeatures=hi_mem,data,scratch NodeName=c001 Arch=x86_64 CoresPerSocket=12 CPUAlloc=2 CPUTot=48 CPULoad=6.08 AvailableFeatu

Re: [slurm-users] Help with failing job execution

2022-06-02 Thread Otto, Frank
Hi Jeff & list, we've encountered the same problem after upgrade to 21.08.8-2. All jobs failed with "Slurmd could not execve job". I've traced this down to the slurmstepd process failing to modify the cgroup setting "memory.memsw.limit_in_bytes", which happens because we have "ConstrainSwapSpac

[slurm-users] Allocation failure when using heterogeneous jobs with sbatch

2022-06-02 Thread GRANGER Nicolas
Hi all, I'm trying to use heterogeneous jobs with the following slurm script: #!/usr/bin/env bash #SBATCH --partition=cpu --time=01:00:00 --nodes=2 --ntasks-per-node=1 --cpus-per-task=2 --mem=8G #SBATCH hetjob #SBATCH --partition=gpu --time=01:00:00 --nodes=2 --ntasks-per-node=1 --cpus-per-