Hi Andrew, I think maybe something is wrong with your slurmd, maybe something missing from your install?
On the node (where slurmd is running), you should see a message similar to this in slurmd.log [2020-05-11T14:29:17.766] Gres Name=gpu Type=titanrtx Count=4 ID=7696487 File=/dev/nvidia[0-3] (null) Regards, Alex On Fri, May 15, 2020 at 2:52 PM Speer, Andrew <asp...@siue.edu> wrote: > I've run into a bit of an issue when trying to define GPU's in our slurm > conf. Any insight is appreciated. > Hopefully relevant lines from the configs below. > > Error: > [2020-05-15T16:35:14.862] error: gres_plugin_node_config_unpack: No plugin > configured to process GRES data from node node3 (Name:gpu Type:(null) > PluginID:7696487 Count:2) > [2020-05-15T16:35:15.321] error: gres_plugin_node_config_unpack: No plugin > configured to process GRES data from node node4 (Name:gpu Type:(null) > PluginID:7696487 Count:1) > [2020-05-15T16:35:15.738] error: gres_plugin_node_config_unpack: No plugin > configured to process GRES data from node node5 (Name:gpu Type:(null) > PluginID:7696487 Count:1) > [2020-05-15T16:35:16.229] error: gres_plugin_node_config_unpack: No plugin > configured to process GRES data from node node6 (Name:gpu Type:(null) > PluginID:7696487 Count:1) > > /etc/slurm/slurm.conf: > GresTypes=gpu > NodeName=node[1-3] CPUs=40 RealMemory=48000 Sockets=2 > CoresPerSocket=10 ThreadsPerCore=2 Feature="pascal,p4000" Gres=gpu:8 > State=UNKNOWN > NodeName=node[4-5,7-10] CPUs=8 RealMemory=48000 Sockets=2 > CoresPerSocket=4 ThreadsPerCore=1 Feature="pascal,p1000" Gres=gpu:8 > State=UNKNOWN > NodeName=node[6] CPUs=24 RealMemory=30000 Sockets=2 > CoresPerSocket=6 ThreadsPerCore=2 Feature="pascal,p1000" Gres=gpu:8 > State=UNKNOWN > > /etc/slurm/gres.conf > NodeName=node[1-3] Name=gpu File=/dev/nvidia[0-7] > NodeName=node[4-10] Name=gpu File=/dev/nvidia[0-4] > > scontrol show node node1 > NodeName=node1 Arch=x86_64 CoresPerSocket=10 > CPUAlloc=0 CPUTot=40 CPULoad=1.75 > AvailableFeatures=pascal,p4000 > ActiveFeatures=pascal,p4000 > Gres=(null) <------------------------ > NodeAddr=node1 NodeHostName=node1 > OS=Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 > RealMemory=48000 AllocMem=0 FreeMem=57465 Sockets=2 Boards=1 > State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A > Partitions=pharmacy > BootTime=2020-05-15T09:26:45 SlurmdStartTime=2020-05-15T16:35:13 > CfgTRES=cpu=40,mem=48000M,billing=40 > AllocTRES= > CapWatts=n/a > CurrentWatts=0 AveWatts=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > >