I upgraded to QEMU emulator version 5.0.50 Using q35-5.1 (the latest) and the following libvirt configuration:
<memory unit="KiB">50331648</memory> <currentMemory unit="KiB">50331648</currentMemory> <memoryBacking> <hugepages/> </memoryBacking> <vcpu placement="static">24</vcpu> <cputune> <vcpupin vcpu="0" cpuset="0"/> <vcpupin vcpu="1" cpuset="12"/> <vcpupin vcpu="2" cpuset="1"/> <vcpupin vcpu="3" cpuset="13"/> <vcpupin vcpu="4" cpuset="2"/> <vcpupin vcpu="5" cpuset="14"/> <vcpupin vcpu="6" cpuset="3"/> <vcpupin vcpu="7" cpuset="15"/> <vcpupin vcpu="8" cpuset="4"/> <vcpupin vcpu="9" cpuset="16"/> <vcpupin vcpu="10" cpuset="5"/> <vcpupin vcpu="11" cpuset="17"/> <vcpupin vcpu="12" cpuset="6"/> <vcpupin vcpu="13" cpuset="18"/> <vcpupin vcpu="14" cpuset="7"/> <vcpupin vcpu="15" cpuset="19"/> <vcpupin vcpu="16" cpuset="8"/> <vcpupin vcpu="17" cpuset="20"/> <vcpupin vcpu="18" cpuset="9"/> <vcpupin vcpu="19" cpuset="21"/> <vcpupin vcpu="20" cpuset="10"/> <vcpupin vcpu="21" cpuset="22"/> <vcpupin vcpu="22" cpuset="11"/> <vcpupin vcpu="23" cpuset="23"/> </cputune> <os> <type arch="x86_64" machine="pc-q35-5.1">hvm</type> <loader readonly="yes" type="pflash">/usr/share/OVMF/x64/OVMF_CODE.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram> <boot dev="hd"/> <bootmenu enable="no"/> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state="on"/> <vapic state="on"/> <spinlocks state="on" retries="8191"/> <vpindex state="on"/> <synic state="on"/> <stimer state="on"/> <vendor_id state="on" value="AuthenticAMD"/> <frequencies state="on"/> </hyperv> <kvm> <hidden state="on"/> </kvm> <vmport state="off"/> <ioapic driver="kvm"/> </features> <cpu mode="host-passthrough" check="none"> <topology sockets="1" cores="12" threads="2"/> <cache mode="passthrough"/> <feature policy="require" name="invtsc"/> <feature policy="require" name="hypervisor"/> <feature policy="require" name="topoext"/> <numa> <cell id="0" cpus="0-2,12-14" memory="12582912" unit="KiB"/> <cell id="1" cpus="3-5,15-17" memory="12582912" unit="KiB"/> <cell id="2" cpus="6-8,18-20" memory="12582912" unit="KiB"/> <cell id="3" cpus="9-11,21-23" memory="12582912" unit="KiB"/> </numa> </cpu> ... /var/log/libvirt/qemu/win10.log: -machine pc-q35-5.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -cpu host,invtsc=on,hypervisor=on,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-vendor-id=AuthenticAMD,hv-frequencies,hv-crash,kvm=off,host-cache-info=on,l3-cache=off \ -m 49152 \ -overcommit mem-lock=off \ -smp 24,sockets=1,cores=12,threads=2 \ -mem-prealloc \ -mem-path /dev/hugepages/libvirt/qemu/3-win10 \ -numa node,nodeid=0,cpus=0-2,cpus=12-14,mem=12288 \ -numa node,nodeid=1,cpus=3-5,cpus=15-17,mem=12288 \ -numa node,nodeid=2,cpus=6-8,cpus=18-20,mem=12288 \ -numa node,nodeid=3,cpus=9-11,cpus=21-23,mem=12288 \ ... For some reason I always get l3-cache=off. CoreInfo.exe in Windows 10 then produces the following report (shortened): Logical to Physical Processor Map: **---------------------- Physical Processor 0 (Hyperthreaded) --*--------------------- Physical Processor 1 ---*-------------------- Physical Processor 2 ----**------------------ Physical Processor 3 (Hyperthreaded) ------**---------------- Physical Processor 4 (Hyperthreaded) --------*--------------- Physical Processor 5 ---------*-------------- Physical Processor 6 ----------**------------ Physical Processor 7 (Hyperthreaded) ------------**---------- Physical Processor 8 (Hyperthreaded) --------------*--------- Physical Processor 9 ---------------*-------- Physical Processor 10 ----------------**------ Physical Processor 11 (Hyperthreaded) ------------------**---- Physical Processor 12 (Hyperthreaded) --------------------*--- Physical Processor 13 ---------------------*-- Physical Processor 14 ----------------------** Physical Processor 15 (Hyperthreaded) Logical Processor to Socket Map: ************************ Socket 0 Logical Processor to NUMA Node Map: ***---------***--------- NUMA Node 0 ---***---------***------ NUMA Node 1 ------***---------***--- NUMA Node 2 ---------***---------*** NUMA Node 3 Approximate Cross-NUMA Node Access Cost (relative to fastest): 00 01 02 03 00: 1.4 1.2 1.1 1.2 01: 1.1 1.1 1.3 1.1 02: 1.0 1.1 1.0 1.2 03: 1.1 1.2 1.2 1.2 Logical Processor to Cache Map: **---------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **---------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **---------------------- Unified Cache 0, Level 2, 512 KB, Assoc 8, LineSize 64 ***--------------------- Unified Cache 1, Level 3, 16 MB, Assoc 16, LineSize 64 --*--------------------- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --*--------------------- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --*--------------------- Unified Cache 2, Level 2, 512 KB, Assoc 8, LineSize 64 ---*-------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 ---*-------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 ---*-------------------- Unified Cache 3, Level 2, 512 KB, Assoc 8, LineSize 64 ---***------------------ Unified Cache 4, Level 3, 16 MB, Assoc 16, LineSize 64 ----**------------------ Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ----**------------------ Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ----**------------------ Unified Cache 5, Level 2, 512 KB, Assoc 8, LineSize 64 ------**---------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ------**---------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64 ------**---------------- Unified Cache 6, Level 2, 512 KB, Assoc 8, LineSize 64 ------**---------------- Unified Cache 7, Level 3, 16 MB, Assoc 16, LineSize 64 --------*--------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 --------*--------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64 --------*--------------- Unified Cache 8, Level 2, 512 KB, Assoc 8, LineSize 64 --------*--------------- Unified Cache 9, Level 3, 16 MB, Assoc 16, LineSize 64 ---------*-------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*-------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64 ---------*-------------- Unified Cache 10, Level 2, 512 KB, Assoc 8, LineSize 64 ---------***------------ Unified Cache 11, Level 3, 16 MB, Assoc 16, LineSize 64 ----------**------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 ----------**------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64 ----------**------------ Unified Cache 12, Level 2, 512 KB, Assoc 8, LineSize 64 ------------**---------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 ------------**---------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64 ------------**---------- Unified Cache 13, Level 2, 512 KB, Assoc 8, LineSize 64 ------------***--------- Unified Cache 14, Level 3, 16 MB, Assoc 16, LineSize 64 --------------*--------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*--------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64 --------------*--------- Unified Cache 15, Level 2, 512 KB, Assoc 8, LineSize 64 ---------------*-------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*-------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------*-------- Unified Cache 16, Level 2, 512 KB, Assoc 8, LineSize 64 ---------------*-------- Unified Cache 17, Level 3, 16 MB, Assoc 16, LineSize 64 ----------------**------ Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------**------ Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------**------ Unified Cache 18, Level 2, 512 KB, Assoc 8, LineSize 64 ----------------**------ Unified Cache 19, Level 3, 16 MB, Assoc 16, LineSize 64 ------------------**---- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------**---- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64 ------------------**---- Unified Cache 20, Level 2, 512 KB, Assoc 8, LineSize 64 ------------------***--- Unified Cache 21, Level 3, 16 MB, Assoc 16, LineSize 64 --------------------*--- Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*--- Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64 --------------------*--- Unified Cache 22, Level 2, 512 KB, Assoc 8, LineSize 64 ---------------------*-- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*-- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64 ---------------------*-- Unified Cache 23, Level 2, 512 KB, Assoc 8, LineSize 64 ---------------------*** Unified Cache 24, Level 3, 16 MB, Assoc 16, LineSize 64 ----------------------** Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------** Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64 ----------------------** Unified Cache 25, Level 2, 512 KB, Assoc 8, LineSize 64 Logical Processor to Group Map: ************************ Group 0 The above result is even further away from the actual L3 cache configuration. So numatune doesn't produce the expected outcome. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1856335 Title: Cache Layout wrong on many Zen Arch CPUs Status in QEMU: New Bug description: AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='8' threads='1'/> In windows, coreinfo reports correctly: ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 On a 3-CCX CPU (3960X /w 6 cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='6' threads='1'/> in windows, coreinfo reports incorrectly: ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/> The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, that if you are nto a business user, you can't ever have a machine with more than two CCX (6 cores) as consumer versions of Windows only supports two sockets. (Should this be reported as a separate bug?) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions