Hi Ali, On 3/10/25 5:23 PM, Alireza Sanaee via wrote: > Specifying the cache layout in virtual machines is useful for > applications and operating systems to fetch accurate information about > the cache structure and make appropriate adjustments. Enforcing correct > sharing information can lead to better optimizations. This patch enables > the specification of cache layout through a command line parameter, > building on a patch set by Intel [1,2,3]. It uses this set as a some dependencies were merged. The series does not apply anymore. > foundation. The device tree and ACPI/PPTT table, and device tree are > populated based on user-provided information and CPU topology. this last sentence need some rewording. > > Example: > > > +----------------+ +----------------+ > | Socket 0 | | Socket 1 | > | (L3 Cache) | | (L3 Cache) | > +--------+-------+ +--------+-------+ > | | > +--------+--------+ +--------+--------+ > | Cluster 0 | | Cluster 0 | > | (L2 Cache) | | (L2 Cache) | > +--------+--------+ +--------+--------+ > | | > +--------+--------+ +--------+--------+ +--------+--------+ > +--------+----+ > | Core 0 | | Core 1 | | Core 0 | | Core 1 > | > | (L1i, L1d) | | (L1i, L1d) | | (L1i, L1d) | | (L1i, > L1d)| > +--------+--------+ +--------+--------+ +--------+--------+ > +--------+----+ > | | | | > +--------+ +--------+ +--------+ +--------+ > |Thread 0| |Thread 1| |Thread 1| |Thread 0| > +--------+ +--------+ +--------+ +--------+ > |Thread 1| |Thread 0| |Thread 0| |Thread 1| > +--------+ +--------+ +--------+ +--------+ > > > The following command will represent the system relying on **ACPI PPTT > tables**. > > ./qemu-system-aarch64 \ > -machine > virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluseter,smp- s/cluseter/cluster > cache.3.cache=l3,smp-cache.3.topology=socket \ > -cpu max \ > -m 2048 \ > -smp sockets=2,clusters=1,cores=2,threads=2 \ > -kernel ./Image.gz \ > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \ > -initrd rootfs.cpio.gz \ > -bios ./edk2-aarch64-code.fd \ > -nographic > > The following command will represent the system relying on **the device > tree**. > > ./qemu-system-aarch64 \ > -machine > virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluseter,smp-cache.3.cache=l3,smp-cache.3.topology=socket > \ > -cpu max \ > -m 2048 \ > -smp sockets=2,clusters=1,cores=2,threads=2 \ > -kernel ./Image.gz \ > -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \ > -initrd rootfs.cpio.gz \ > -nographic > > Failure cases: > 1) There are scenarios where caches exist in systems' registers but > left unspecified by users. In this case qemu returns failure. Can you give more details on 1)? is it a TCG case or does it also exist with KVM acceleration? > > 2) SMT threads cannot share caches which is not very common. More > discussions here [4]. > > Currently only three levels of caches are supported to be specified from > the command line. However, increasing the value does not require > significant changes. Further, this patch assumes l2 and l3 unified > caches and does not allow l(2/3)(i/d). The level terminology is > thread/core/cluster/socket right now. Hierarchy assumed in this patch: > Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3; > > TODO: > 1) Making the code to work with arbitrary levels > 2) Separated data and instruction cache at L2 and L3. > 3) Additional cache controls. e.g. size of L3 may not want to just > match the underlying system, because only some of the associated host > CPUs may be bound to this VM. Does it mean this is more an RFC or do you plan to send improvement patches once this series gets upstream?
Thanks Eric > > [1] https://lore.kernel.org/kvm/20240908125920.1160236-1-zhao1....@intel.com/ > [2] > https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1....@intel.com/ > [3] > https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1....@intel.com/ > [4] > https://lore.kernel.org/devicetree-spec/20250203120527.3534-1-alireza.san...@huawei.com/ > > Change Log: > v7->v8: > * rebase: Merge tag 'pull-nbd-2024-08-26' of https://repo.or.cz/qemu/ericb > into staging > * I mis-included a file in patch #4 and I removed it in this one. > > v6->v7: > * Intel stuff got pulled up, so rebase. > * added some discussions on device tree. > > v5->v6: > * Minor bug fix. > * rebase based on new Intel patchset. > - > https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1....@intel.com/ > > v4->v5: > * Added Reviewed-by tags. > * Applied some comments. > > v3->v4: > * Device tree added. > > Depends-on: Building PPTT with root node and identical implementation flag > Depends-on: Msg-id: 20250306023342.508-1-alireza.san...@huawei.com > > Alireza Sanaee (6): > target/arm/tcg: increase cache level for cpu=max > arm/virt.c: add cache hierarchy to device tree > bios-tables-test: prepare to change ARM ACPI virt PPTT > hw/acpi/aml-build.c: add cache hierarchy to pptt table > tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology > Update the ACPI tables according to the acpi aml_build change, also > empty bios-tables-test-allowed-diff.h. > > hw/acpi/aml-build.c | 205 +++++++++++- > hw/arm/virt-acpi-build.c | 8 +- > hw/arm/virt.c | 350 +++++++++++++++++++++ > hw/cpu/core.c | 92 ++++++ > hw/loongarch/virt-acpi-build.c | 2 +- > include/hw/acpi/aml-build.h | 4 +- > include/hw/arm/virt.h | 4 + > include/hw/cpu/core.h | 27 ++ > target/arm/tcg/cpu64.c | 13 + > tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 540 bytes > tests/qtest/bios-tables-test.c | 4 + > 11 files changed, 701 insertions(+), 8 deletions(-) >