No, creating artificial NUMA nodes is, simply put, never a good solution for CPUs that operate as a single NUMA node - which is the case for all Zen2 CPUs (except maybe EPYCs? not sure about those).
You may workaround the L3 issue that way, but hit many new bugs/problems by introducing multiple NUMA nodes, _especially_ on Windows VMs, because that OS has crappy NUMA handling and multitude of bugs related to it - which was one of the major reasons why even Zen2 Threadrippers are now single NUMA node (e.g. https://www.servethehome.com/wp- content/uploads/2019/11/AMD-Ryzen-Threadripper-3960X-Topology.png ). The host CPU architecture should be replicated as closely as possible on the VM and for Zen2 CPUs with 4 cores per CCX, _this already works perfectly_ - there are no problems on 3300X/3700(X)/3800X/3950X/3970X/3990X. There is, unfortunately, no way to customize/specify the "disabled" CPU cores in QEMU, and therefore no way to emulate 1 NUMA node + L3 cache per 2/3 cores - only to passthrough the cache config from host, which is unfortunately not done correctly for CPUs with disabled cores (but again, works perfectly for CPUs with all 4 cores enabled per CCX). lscpu: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 113 Model name: AMD Ryzen 9 3900X 12-Core Processor Stepping: 0 Frequency boost: enabled CPU MHz: 2972.127 CPU max MHz: 3800.0000 CPU min MHz: 2200.0000 BogoMIPS: 7602.55 Virtualization: AMD-V L1d cache: 384 KiB L1i cache: 384 KiB L2 cache: 6 MiB L3 cache: 64 MiB NUMA node0 CPU(s): 0-23 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonsto p_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a mi salignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase b mi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca But the important thing has already been posted here in previous comments - notice the skipped core ids belonging to the disabled cores: virsh capabilities | grep "cpu id": <cpu id='0' socket_id='0' core_id='0' siblings='0,12'/> <cpu id='1' socket_id='0' core_id='1' siblings='1,13'/> <cpu id='2' socket_id='0' core_id='2' siblings='2,14'/> <cpu id='3' socket_id='0' core_id='4' siblings='3,15'/> <cpu id='4' socket_id='0' core_id='5' siblings='4,16'/> <cpu id='5' socket_id='0' core_id='6' siblings='5,17'/> <cpu id='6' socket_id='0' core_id='8' siblings='6,18'/> <cpu id='7' socket_id='0' core_id='9' siblings='7,19'/> <cpu id='8' socket_id='0' core_id='10' siblings='8,20'/> <cpu id='9' socket_id='0' core_id='12' siblings='9,21'/> <cpu id='10' socket_id='0' core_id='13' siblings='10,22'/> <cpu id='11' socket_id='0' core_id='14' siblings='11,23'/> <cpu id='12' socket_id='0' core_id='0' siblings='0,12'/> <cpu id='13' socket_id='0' core_id='1' siblings='1,13'/> <cpu id='14' socket_id='0' core_id='2' siblings='2,14'/> <cpu id='15' socket_id='0' core_id='4' siblings='3,15'/> <cpu id='16' socket_id='0' core_id='5' siblings='4,16'/> <cpu id='17' socket_id='0' core_id='6' siblings='5,17'/> <cpu id='18' socket_id='0' core_id='8' siblings='6,18'/> <cpu id='19' socket_id='0' core_id='9' siblings='7,19'/> <cpu id='20' socket_id='0' core_id='10' siblings='8,20'/> <cpu id='21' socket_id='0' core_id='12' siblings='9,21'/> <cpu id='22' socket_id='0' core_id='13' siblings='10,22'/> <cpu id='23' socket_id='0' core_id='14' siblings='11,23'/> -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1856335 Title: Cache Layout wrong on many Zen Arch CPUs Status in QEMU: New Bug description: AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='8' threads='1'/> In windows, coreinfo reports correctly: ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 On a 3-CCX CPU (3960X /w 6 cores and no SMT): <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>EPYC-IBPB</model> <vendor>AMD</vendor> <topology sockets='1' cores='6' threads='1'/> in windows, coreinfo reports incorrectly: ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/> The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, that if you are nto a business user, you can't ever have a machine with more than two CCX (6 cores) as consumer versions of Windows only supports two sockets. (Should this be reported as a separate bug?) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions