Hello, I'm relatively new to administering slurm, so my apologies if I've missed something obvious.
We have nodes of 4 GPU and nodes of 8 GPU. I would like users to be able to request a total number of GPUs they require. The MPI software is not fussed how many nodes it spans. I had hoped requests such as these would work: #SBATCH --gres=gpu:8 #SBATCH --exclusive #SBATCH --nodes=1-2 However as both "gres" (or an alternate workaround "mem") are per-node resources rather than per-job this doesn't work -- a pair of 4-GPU boxes can never be chosen. So -- is there a way to do this right, or to fake it? Such jobs should run on whatever appropriate hardware configuration is first available. The submitted job script will then slightly reconfigure our software configuration depending on the hardware type it lands on, before launching via srun. As an alternative -- I note the "heterogeneous jobs" feature. This allows jobs which require resources of "hardware config A" AND "hardware config B". Is there anyway to request one hardware configuration OR another? I can almost fake it for a single use-case with "constraints", however this syntax doesn't seem understood by the parser code: --constraints=[grp1|grp2|grp3|grp4]&[gpuA*1&gpuB*1] --nodes=1-2 --exclusive With example node configuration: NodeName=small1 Gres=gpu:4 Feature=gpuA,grp1 NodeName=small2 Gres=gpu:4 Feature=gpuB,grp1 NodeName=small3 Gres=gpu:4 Feature=gpuB,grp2 NodeName=small4 Gres=gpu:4 Feature=gpuB,grp2 NodeName=big1 Gres=gpu:8 Feature=gpuA,gpuB,grp3 NodeName=big2 Gres=gpu:8 Feature=gpuA,gpuB,grp4 All ideas are appreciated. Thanks, Rob Middleton.