Hello all,
I'm trying to turn off core specialization in my cluster by setting CoreSpecCount=0, but checking with scontrol does not show my changes. If I set CoreSpec=1 or CoreSpecCount=2, or anything except 0, the changes are applied correctly. But when I set it to 0, no change is applied -- it remains on whatever the previous number was. with CoreSpecCount=1: --------------------------------------- # scontrol show node node016 NodeName=node016 Arch=x86_64 CoresPerSocket=18⋅ CPUAlloc=0 CPUTot=72 CPULoad=0.01 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=node016 NodeHostName=node016⋅ OS=Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018⋅ RealMemory=95306 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 CoreSpecCount=1 CPUSpecList=70-71⋅ State=IDLE ThreadsPerCore=2 TmpDisk=2038 Weight=1 Owner=N/A MCS_label=N/A Partitions=test⋅ BootTime=2019-06-19T08:41:49 SlurmdStartTime=2019-06-27T09:06:26 CfgTRES=cpu=72,mem=95306M,billing=72 AllocTRES= CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s --------------------------------------- That is correct. with CoreSpecCount=0: --------------------------------------- # scontrol show node node016 NodeName=node016 Arch=x86_64 CoresPerSocket=18⋅ CPUAlloc=0 CPUTot=72 CPULoad=0.01 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=node016 NodeHostName=node016⋅ OS=Linux 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018⋅ RealMemory=95306 AllocMem=0 FreeMem=92773 Sockets=2 Boards=1 CoreSpecCount=1 CPUSpecList=70-71⋅ State=IDLE ThreadsPerCore=2 TmpDisk=2038 Weight=1 Owner=N/A MCS_label=N/A Partitions=test⋅ BootTime=2019-06-19T08:41:49 SlurmdStartTime=2019-06-27T09:06:26 CfgTRES=cpu=72,mem=95306M,billing=72 AllocTRES= CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s --------------------------------------- That is wrong. It's exactly the same -- CoreSpecCount still shows 1. The weird thing is that if I run slurmd in the foreground in verbose mode on the node with "slurmd -cDvvf /etc/slurm/slurm.conf", the change appears to be recognized. Results with CoreSpecCount=1: --------------------------------------- slurmd: got reconfigure request slurmd: all threads complete slurmd: debug: Reading slurm.conf file: /etc/slurm/slurm.conf slurmd: debug: Ignoring obsolete CacheGroups option. slurmd: debug: Log file re-opened slurmd: debug: CPUs:72 Boards:1 Sockets:2 CoresPerSocket:18 ThreadsPerCore:2 slurmd: Message aggregation disabled slurmd: debug: Reading cgroup.conf file /etc/slurm/cgroup.conf slurmd: debug: Reading cgroup.conf file /etc/slurm/cgroup.conf slurmd: debug: Reading cgroup.conf file /etc/slurm/cgroup.conf slurmd: debug: xcgroup_instantiate: cgroup '/sys/fs/cgroup/cpuset/slurm' already exists slurmd: debug: xcgroup_instantiate: cgroup '/sys/fs/cgroup/cpuset/slurm/system' already exists slurmd: debug: system cgroup: system cpuset cgroup initialized slurmd: Resource spec: Reserved abstract CPU IDs: 70-71 slurmd: Resource spec: Reserved machine CPU IDs: 35,71 slurmd: debug: Resource spec: Reserved system memory limit not configured for this node --------------------------------------- Results with CoreSpecCount=0: --------------------------------------- slurmd: got reconfigure request slurmd: all threads complete slurmd: debug: Reading slurm.conf file: /etc/slurm/slurm.conf slurmd: debug: Ignoring obsolete CacheGroups option. slurmd: debug: Log file re-opened slurmd: debug: CPUs:72 Boards:1 Sockets:2 CoresPerSocket:18 ThreadsPerCore:2 slurmd: Message aggregation disabled slurmd: debug: Reading cgroup.conf file /etc/slurm/cgroup.conf slurmd: debug: Resource spec: No specialized cores configured by default on this node slurmd: debug: Resource spec: Reserved system memory limit not configured for this node --------------------------------------- The reserved CPUs have been removed as they should be. So why does scontrol still show the incorrect value (and jobs still do not run on those cores)? Dave David Guertin Information Technology Services Middlebury College 700 Exchange St. Middlebury, VT 05753 (802)443-3143