This is the CPU cache layout as shown by lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 yes 3800.0000 2200.0000 1 0 0 1 1:1:1:0 yes 3800.0000 2200.0000 2 0 0 2 2:2:2:0 yes 3800.0000 2200.0000 3 0 0 3 3:3:3:1 yes 3800.0000 2200.0000 4 0 0 4 4:4:4:1 yes 3800.0000 2200.0000 5 0 0 5 5:5:5:1 yes 3800.0000 2200.0000 6 0 0 6 6:6:6:2 yes 3800.0000 2200.0000 7 0 0 7 7:7:7:2 yes 3800.0000 2200.0000 8 0 0 8 8:8:8:2 yes 3800.0000 2200.0000 9 0 0 9 9:9:9:3 yes 3800.0000 2200.0000 10 0 0 10 10:10:10:3 yes 3800.0000 2200.0000 11 0 0 11 11:11:11:3 yes 3800.0000 2200.0000 12 0 0 0 0:0:0:0 yes 3800.0000 2200.0000 13 0 0 1 1:1:1:0 yes 3800.0000 2200.0000 14 0 0 2 2:2:2:0 yes 3800.0000 2200.0000 15 0 0 3 3:3:3:1 yes 3800.0000 2200.0000 16 0 0 4 4:4:4:1 yes 3800.0000 2200.0000 17 0 0 5 5:5:5:1 yes 3800.0000 2200.0000 18 0 0 6 6:6:6:2 yes 3800.0000 2200.0000 19 0 0 7 7:7:7:2 yes 3800.0000 2200.0000 20 0 0 8 8:8:8:2 yes 3800.0000 2200.0000 21 0 0 9 9:9:9:3 yes 3800.0000 2200.0000 22 0 0 10 10:10:10:3 yes 3800.0000 2200.0000 23 0 0 11 11:11:11:3 yes 3800.0000 2200.0000
I was trying to allocate cache using the cachetune feature in libvirt, but it turns out to be either misleading or much too complicated to be usable. Here is what I tried: <vcpu placement="static">24</vcpu> <cputune> <vcpupin vcpu="0" cpuset="0"/> <vcpupin vcpu="1" cpuset="12"/> <vcpupin vcpu="2" cpuset="1"/> <vcpupin vcpu="3" cpuset="13"/> <vcpupin vcpu="4" cpuset="2"/> <vcpupin vcpu="5" cpuset="14"/> <vcpupin vcpu="6" cpuset="3"/> <vcpupin vcpu="7" cpuset="15"/> <vcpupin vcpu="8" cpuset="4"/> <vcpupin vcpu="9" cpuset="16"/> <vcpupin vcpu="10" cpuset="5"/> <vcpupin vcpu="11" cpuset="17"/> <vcpupin vcpu="12" cpuset="6"/> <vcpupin vcpu="13" cpuset="18"/> <vcpupin vcpu="14" cpuset="7"/> <vcpupin vcpu="15" cpuset="19"/> <vcpupin vcpu="16" cpuset="8"/> <vcpupin vcpu="17" cpuset="20"/> <vcpupin vcpu="18" cpuset="9"/> <vcpupin vcpu="19" cpuset="21"/> <vcpupin vcpu="20" cpuset="10"/> <vcpupin vcpu="21" cpuset="22"/> <vcpupin vcpu="22" cpuset="11"/> <vcpupin vcpu="23" cpuset="23"/> <cachetune vcpus="0-2,12-14"> <cache id="0" level="3" type="both" size="16" unit="MiB"/> <monitor level="3" vcpus="0-2,12-14"/> </cachetune> <cachetune vcpus="3-5,15-17"> <cache id="1" level="3" type="both" size="16" unit="MiB"/> <monitor level="3" vcpus="3-5,15-17"/> </cachetune> <cachetune vcpus="6-8,18-20"> <cache id="2" level="3" type="both" size="16" unit="MiB"/> <monitor level="3" vcpus="6-8,18-20"/> </cachetune> <cachetune vcpus="9-11,21-23"> <cache id="3" level="3" type="both" size="16" unit="MiB"/> <monitor level="3" vcpus="9-11,21-23"/> </cachetune> </cputune> Unfortunately it gives the following error when I try to start the VM: Error starting domain: internal error: Missing or inconsistent resctrl info for memory bandwidth allocation I have resctrl mounted like this: mount -t resctrl resctrl /sys/fs/resctrl This error leads to the following description on how to allocate memory bandwith: https://software.intel.com/content/www/us/en/develop/articles /use-intel-resource-director-technology-to-allocate-memory- bandwidth.html I think this is over the top and perhaps I'm trying the wrong approach. All I can say is that every suggestion I've seen and tried so far has led me to one conclusion: QEMU does NOT support the L3 cache layout of the new ZEN 2 arch CPUs such as the Ryzen 9 3900X. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1856335 Title: Cache Layout wrong on many Zen Arch CPUs Status in QEMU: New Bug description: AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='8' threads='1'/> In windows, coreinfo reports correctly: ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 On a 3-CCX CPU (3960X /w 6 cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='6' threads='1'/> in windows, coreinfo reports incorrectly: ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/> The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, that if you are nto a business user, you can't ever have a machine with more than two CCX (6 cores) as consumer versions of Windows only supports two sockets. (Should this be reported as a separate bug?) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions