Hello, I just added a 3rd node to my slurm partition (called "hsw5"), as we continue to enable Slurm in our environment. But the new node is not accepting jobs that require a GPU, despite the fact that it has 3 GPUs.
The other node that has a GPU ("devops3") is accepting GPU jobs as expected. A colleague pointed out an interesting difference (under the GRES column) when we did this command: (! 676)-> sinfo -o "%20N %10c %10m %25f %20G " NODELIST CPUS MEMORY AVAIL_FEATURES GRES devops2 4 9913 avx,centos,fast,fma,fma4, (null) devops3 8 40213 centos,cuda10.1p,cuda10.2 *gpu:1(S:0-1)* hsw5 64 257847 foo,bar *gpu:3* Is there a problem with the GPU bindings on "hsw5"? Do GPUs need to be associated with sockets, or something like that? Here is the error message I'm seeing: (! 681)-> /opt/slurm-20.11.5/bin/sbatch --export=NONE -N 1 --constraint foo --gpus=1 --wrap "ls" sbatch: error: Batch job submission failed: Requested node configuration is not available (! 682)-> /opt/slurm-20.11.5/bin/sbatch --export=NONE -N 1 --constraint foo --wrap "ls" Submitted batch job 385 Thanks for the help, David