[slurm-users] Problem with nodes with 1 gpu

Jörg Striewski via slurm-users Wed, 16 Oct 2024 01:05:43 -0700

i cannot send jobs to nodes with one gpu, i don't find the bug in myconfiguration. can someone help me ?


in slurm.conf    GresTypes=gpu is set


this are some nodes in slurm.conf

NodeName=gpu-[001-003] CPUs=8 SocketsPerBoard=1CoresPerSocket=4 RealMemory=31000 Gres=gpu:1080:1NodeName=gpu-[010-019] CPUs=16 SocketsPerBoard=1CoresPerSocket=8 RealMemory=64000 Gres=gpu:1080:2


the partition for this gpu nodes is

# General GPU partitions

PartitionName=GPU Nodes=gpu-[001-003,010-019] AllowAccounts=staff PreemptMode=REQUEUE PriorityTier=0 DefMemPerGPU=32000 DefCpuPerGPU=8 CpuBind=none TRESBillingWeights="GRES/gpu=1000" GraceTime=300


this are the entries for some nodes in gres.conf

NodeName=gpu-[001-003]   Name=gpu   Type=1080   File=/dev/nvidia0
NodeName=gpu-[010-019]   Name=gpu   Type=1080 File=/dev/nvidia[0-1]

when i send a job with sbatch to gpu-001

#SBATCH --job-name=hello
#SBATCH --ntasks-per-node=1
#SBATCH --output=hello_%A.out
#SBATCH --time=00:10:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=striew...@ismll.de
#SBATCH --partition=GPU
#SBATCH --nodelist=gpu-001
#SBATCH --gres=gpu:1

[...]

i get the error

sbatch: error: Batch job submission failed: Requested node configurationis not available

when i send the job to a node with 2 gpu's it runs with no error, justsetting --nodelist=gpu-12


has someone a hint what i made wrong ?


Mit freundlichen Grüßen / kind regards

--
Jörg Striewski

Information Systems and Machine Learning Lab (ISMLL)
Institute of Computer Science
University of Hildesheim Germany
post address: Universitätsplatz 1, D-31141Hildesheim, Germany
visitor address: Samelsonplatz 1, D-31141 Hildesheim,Germany
Tel.(+49) 05121 / 883-40392
http://www.ismll.uni-hildesheim.de


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Problem with nodes with 1 gpu

Reply via email to