I tried doing this as follows: Node's gres.conf: ################################################################## # Slurm's Generic Resource (GRES) configuration file ################################################################## Name=gpu File=/dev/nvidia0 Type=1050TI Name=gpu_mem_per_card Count=4G Name=gpu_cores_per_card Count=768
>From slurm.conf: NodeName=n75 CPUs=32 Gres=gpu:1,gpu_mem_per_card:no_consume:4G,gpu_cores_per_card:no_consume:768 Feature=GPUMODEL_1050TI But, when I restarted slurmctld, distributed the slurm.conf to the cluster nodes, and did a "scontrol reconfigure", the node went into a "DRAIN" state, with the following [root@slurm-controller ~]# scontrol show node n75 NodeName=n75 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.01 AvailableFeatures=GPUMODEL_1050TI ActiveFeatures=GPUMODEL_1050TI Gres=gpu:1,gpu_mem_per_card:no_consume:4G,gpu_cores_per_card:no_consume:768 NodeAddr=n75 NodeHostName=n75 Version=16.05 OS=Linux RealMemory=128905 AllocMem=0 FreeMem=124347 Sockets=32 Boards=1 State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=240180 Weight=1 Owner=N/A MCS_label=N/A BootTime=2019-02-13T14:35:03 SlurmdStartTime=2019-02-13T15:07:27 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=gres/gpu_mem_per_card count too low (0 < 4294967296) [root@2019-03-21T22:23:59] <<<<<<<<<<<<<<< Why does it think that the "gres/gpu_mem_per_card" count is 0? How can I fix this? -----Original Message----- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Quirin Lohr Sent: Wednesday, March 20, 2019 4:06 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Can one specify attributes on a GRES resource? Hi Will, I solved this by creating a new GRES: Some nodes have VRAM:no_consume:12G Some nodes have VRAM:no_consume:24G "no_consume" because it would be for the whole node otherwise. It only works because the nodes only have one type of GPUs each. It is then requested with --gres=gpu:1,VRAM:16G Here an extract of my slurm.conf > NodeName=node7 Gres=gpu:p6000:8,VRAM:no_consume:24G Boards=1 > SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257843 > Weight=10 Feature=p6000 > NodeName=node6 Gres=gpu:titanxpascal:8,VRAM:no_consume:12G Boards=1 > SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257854 > Weight=1 Feature=titanxp The cudacores could be implemented accordingly (which is a nice idea btw.). Regards Quirin -- Quirin Lohr Systemadministration Technische Universität München Fakultät für Informatik Lehrstuhl für Bildverarbeitung und Mustererkennung Boltzmannstrasse 3 85748 Garching Tel. +49 89 289 17769 Fax +49 89 289 17757 quirin.l...@in.tum.de www.vision.in.tum.de