Interesting. I'm still confused by the where slurmd -C is getting the data. When I think of where the kernel stores info about the processor, I normally think of /proc/cpuinfo. (by the way, I am running centos 7 in the vm. The vm hypervisor is VMware). /proc/cpuinfo does show 16 cores.
I understand your concern over the processor speed. So I tried a different vm where I see the following specs: vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz When I increase the core count on that vm, reboot, and run slurm -C it too continues to show the lower original core count. Specifically, how is slurmd -C getting that info? Maybe this is a kernel issue, but other than lscpu and /proc/cpuinfo, I don't know where to look. Maybe I should be looking at the slurmd source? -Mike *Michael Tie *Technical Director Mathematics, Statistics, and Computer Science One North College Street phn: 507-222-4067 Northfield, MN 55057 cel: 952-212-8933 m...@carleton.edu fax: 507-222-4312 On Sun, Mar 8, 2020 at 7:32 PM Kirill 'kkm' Katsnelson <k...@pobox.com> wrote: > To answer your direct question, the ground truth of 'slurmctld -C' is what > the kernel thinks the hardware is (what you see in lscpu, except it > probably employs some tricks for VMs with an odd topology). And it got > severely confused by what the kernel reported to it. I know from experience > that certain odd cloud VM shapes throw it off balance. > > I do not really like the output of lscpu. I have never seen such a strange > shape of a VM. CPU family 15 is in the Pentium 4 line < > https://software.intel.com/en-us/articles/intel-architecture-and-processor-identification-with-cpuid-model-and-family-numbers>, > and model 6 was the last breath of this unsuccessful NetBurst > architecture--such a rarity that Linux kernel does not even have in its > database: "Common KVM processor" is a slug for "everything else that one > of these soul-sapping KVMs may return". Flags show that the processor > supports SSE2 and 3, but not 4.1, 4.2 or AVX, which is consistent with a > Pentium 4, but 16M of L3 cache is about an average total RAM in a desktop > at the time P4 was a thing. And the CPU is a NUMA (no real Pentium 4 had > the NUMA, only SMP)¹. > > Any advice? >> > > My best advice would be to either use a different hypervisor or tune > correctly the one you have. Sometimes a hypervisor is tuned for live VM > migration, when it is frozen on one hardware type and thawed on another, > and may tweak the CPUID in advance to hide features from the guest OS so > that it would be able to continue if migrated to less capable hardware; but > still, using the P4 as the least common denominator is way too extreme. > Something is seriously wrong on the KVM host. > > The VM itself is braindead. Even if you will have got it up and running, > the absence of SSE4.1 and 4.2, AVX, AXV2, and AVX512² would make it about > as efficient a computing node as a brick. Unless the host CPU is really a > Presler Pentium 4, in which case you are way too long overdue for a > hardware upgrade :))) > > -kkm > ____ > ¹ It's not impossible that lscpu shows an SMP machine as if containing a > single NUMA node, but I have a recollection that this is not the case. I > haven't seen a non-NUMA CPU in quite a while. > ² Intel had gone besides-itself-creative this time. It was even bigger a > naming leap than switching from Roman to decimal between Pentium III to > Pentium *drum roll* 4 *cymbal crash*. > > > On Sun, Mar 8, 2020 at 1:20 PM mike tie <m...@carleton.edu> wrote: > >> >> I am running a slurm client on a virtual machine. The virtual machine >> originally had a core count of 10. But I have now increased the cores to >> 16, but "slurmd -C" continues to show 10. I have increased the core count >> in the slurm.conf file. and that is being seen correctly. The state of the >> node is stuck in a Drain state because of this conflict. How do I get >> slurmd -C to see the new number of cores? >> >> I'm running slurm 18.08. I have tried running "scontrol reconfigure" on >> the head node. I have restarted slurmd on all the client nodes, and I have >> restarted slurmctld on the master node. >> >> Where is the data about compute note CPUs stored? I can't seem to find a >> config or setting file on the compute node. >> >> The compute node that I am working on is "liverpool" >> >> *mtie@liverpool** ~ $* slurmd -C >> >> NodeName=liverpool CPUs=10 Boards=1 SocketsPerBoard=10 CoresPerSocket=1 >> ThreadsPerCore=1 RealMemory=64263 >> >> UpTime=1-21:55:36 >> >> >> *mtie@liverpool** ~ $* lscpu >> >> Architecture: x86_64 >> >> CPU op-mode(s): 32-bit, 64-bit >> >> Byte Order: Little Endian >> >> CPU(s): 16 >> >> On-line CPU(s) list: 0-15 >> >> Thread(s) per core: 1 >> >> Core(s) per socket: 4 >> >> Socket(s): 4 >> >> NUMA node(s): 1 >> >> Vendor ID: GenuineIntel >> >> CPU family: 15 >> >> Model: 6 >> >> Model name: Common KVM processor >> >> Stepping: 1 >> >> CPU MHz: 2600.028 >> >> BogoMIPS: 5200.05 >> >> Hypervisor vendor: KVM >> >> Virtualization type: full >> >> L1d cache: 32K >> >> L1i cache: 32K >> >> L2 cache: 4096K >> >> L3 cache: 16384K >> >> NUMA node0 CPU(s): 0-15 >> >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >> pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm >> constant_tsc nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm >> >> >> *mtie@liverpool** ~ $* more /etc/slurm/slurm.conf | grep liverpool >> >> NodeName=*liverpool* NodeAddr=137.22.10.202 CPUs=16 State=UNKNOWN >> >> PartitionName=BioSlurm Nodes=*liverpool* Default=YES MaxTime=INFINITE >> State=UP >> >> >> *mtie@liverpool** ~ $* sinfo -n liverpool -o %c >> >> CPUS >> >> 16 >> >> *mtie@liverpool** ~ $* sinfo -n liverpool -o %E >> >> REASON >> >> Low socket*core*thread count, Low CPUs >> >> >> >> Any advice? >> >> >>