Maybe I'm reading it wrong, but your partition sets DefMemPerGPU at 32000 and the nodes only have 31000 real memory available.
Rob ________________________________ From: Jörg Striewski via slurm-users <slurm-users@lists.schedmd.com> Sent: Wednesday, October 16, 2024 4:05 AM To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [slurm-users] Problem with nodes with 1 gpu i cannot send jobs to nodes with one gpu, i don't find the bug in my configuration. can someone help me ? in slurm.conf GresTypes=gpu is set this are some nodes in slurm.conf NodeName=gpu-[001-003] CPUs=8 SocketsPerBoard=1 CoresPerSocket=4 RealMemory=31000 Gres=gpu:1080:1 NodeName=gpu-[010-019] CPUs=16 SocketsPerBoard=1 CoresPerSocket=8 RealMemory=64000 Gres=gpu:1080:2 the partition for this gpu nodes is # General GPU partitions PartitionName=GPU Nodes=gpu-[001-003,010-019] AllowAccounts=staff PreemptMode=REQUEUE PriorityTier=0 DefMemPerGPU=32000 DefCpuPerGPU=8 CpuBind=none TRESBillingWeights="GRES/gpu=1000" GraceTime=300 this are the entries for some nodes in gres.conf NodeName=gpu-[001-003] Name=gpu Type=1080 File=/dev/nvidia0 NodeName=gpu-[010-019] Name=gpu Type=1080 File=/dev/nvidia[0-1] when i send a job with sbatch to gpu-001 #SBATCH --job-name=hello #SBATCH --ntasks-per-node=1 #SBATCH --output=hello_%A.out #SBATCH --time=00:10:00 #SBATCH --mail-type=ALL #SBATCH --mail-user=striew...@ismll.de #SBATCH --partition=GPU #SBATCH --nodelist=gpu-001 #SBATCH --gres=gpu:1 [...] i get the error sbatch: error: Batch job submission failed: Requested node configuration is not available when i send the job to a node with 2 gpu's it runs with no error, just setting --nodelist=gpu-12 has someone a hint what i made wrong ? Mit freundlichen Grüßen / kind regards -- Jörg Striewski Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim Germany post address: Universitätsplatz 1, D-31141Hildesheim, Germany visitor address: Samelsonplatz 1, D-31141 Hildesheim,Germany Tel.(+49) 05121 / 883-40392 https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.ismll.uni-hildesheim.de%2F&data=05%7C02%7Crug262%40psu.edu%7C27ff9611a1bb425f391f08dcedb9a7b4%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638646628848815045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=kpE%2BFiIm8PUznv8mx7jCJpOP1U1VQZaJnZO06%2FM%2FRZQ%3D&reserved=0<http://www.ismll.uni-hildesheim.de/> -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com