[AMD Official Use Only - AMD Internal Distribution Only] Hi Jonathan,
Could you please provide your comments? Thanks, Zhigang > -----Original Message----- > From: David Hildenbrand <da...@redhat.com> > Sent: Tuesday, December 10, 2024 5:02 PM > To: Luo, Zhigang <zhigang....@amd.com>; qemu-devel@nongnu.org > Cc: kra...@redhat.com; Igor Mammedov <imamm...@redhat.com>; Jonathan > Cameron <jonathan.came...@huawei.com> > Subject: Re: [PATCH] hostmem-file: add the 'hmem' option > > On 10.12.24 22:51, Luo, Zhigang wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > >> -----Original Message----- > >> From: David Hildenbrand <da...@redhat.com> > >> Sent: Tuesday, December 10, 2024 2:55 PM > >> To: Luo, Zhigang <zhigang....@amd.com>; qemu-devel@nongnu.org > >> Cc: kra...@redhat.com; Igor Mammedov <imamm...@redhat.com> > >> Subject: Re: [PATCH] hostmem-file: add the 'hmem' option > >> > >> On 10.12.24 20:32, Luo, Zhigang wrote: > >>> [AMD Official Use Only - AMD Internal Distribution Only] > >>> > >>> Hi David, > >>> > >> > >> Hi, > >> > >>>>> > >>>>> Thanks for your comments. > >>>>> Let me give you some background for this patch. > >>>>> I am currently engaged in a project that requires to pass the > >>>>> EFI_MEMORY_SP > >>>> (Special Purpose Memory) type memory from host to a virtual machine > >>>> within QEMU. This memory needs to be EFI_MEMORY_SP type in the > >>>> virtual machine as well. > >>>>> This particular memory type is essential for the functionality of my > >>>>> project. > >>>> > >>>> Which exact guest memory will be backed by this memory? All guest-memory? > >>> [Luo, Zhigang] not all guest-memory. Only the memory reserved for > >>> specific > >> device. > >> > >> Can you show me an example QEMU cmdline, and how you would pass that > >> hostmem-file object to the device? > >> > > [Luo, Zhigang] the following is an example. m1 is the reserved memory for > > pci > device "0000:03:00.0". both the memory and pci device are set to same numa > node. > > > > -object memory-backend-ram,size=8G,id=m0 \ -object > > memory-backend-file,size=16G,id=m1,mem-path=/dev/dax0.0,prealloc=on,al > > ign=1G,hmem=on \ -numa node,nodeid=0,memdev=m0 -numa > > node,nodeid=1,memdev=m1 \ > > Okay, so you expose this memory as a second numa node, and want the guest to > identify the second numa node as SP to not use it during boot. > > Let me CC Jonathan, I am pretty sure he has an idea what to do here. > > > -device pxb-pcie,id=pcie.1,numa_node=1,bus_nr=2,bus=pcie.0 \ -device > > ioh3420,id=pcie_port1,bus=pcie.1,chassis=1 \ -device > > vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pcie_port1 > > > >>> > >>>> > >>>> And, what is the guest OS going to do with this memory? > >>> [Luo, Zhigang] the device driver in guest will use this reserved memory. > >> > >> Okay, so just like CXL memory. > >> > >>> > >>>> > >>>> Usually, this SP memory (dax, cxl, ...) is not used as boot memory. > >>>> Like on a bare metal system, one would expect that only CXL memory > >>>> will be marked as special and put aside to the cxl driver, such > >>>> that the OS can boot on ordinary DIMMs, such that cxl can online it etc. > >>>> > >>>> So maybe you would want to expose this memory using CXL-mem device > >>>> to the VM? Or a DIMM? > >>>> > >>>> I assume the alternative is to tell the VM on the Linux kernel > >>>> cmdline to set EFI_MEMORY_SP on this memory. I recall that there is > >>>> a way to > >> achieve that. > >>>> > >>> [Luo, Zhigang] I know this option. but it requires the end user to > >>> know where is the > >> memory location in guest side(start address, size). > >> > >> Right. > >> > >>> > >>> > >>>>> In Linux, the SPM memory will be claimed by hmem-dax driver by > >>>>> default. With > >>>> this patch I can use the following config to pass the SPM memory to > >>>> guest VM. > >>>>> -object > >>>>> memory-backend-file,size=30G,id=m1,mem-path=/dev/dax0.0,prealloc=o > >>>>> n, > >>>>> al > >>>>> ign=1G,hmem=on > >>>>> > >>>>> I was thinking to change the option name from "hmem" to "spm" to > >>>>> avoid > >>>> confusion. > >>>> > >>>> Likely it should be specified elsewhere, that you want specific > >>>> guest RAM ranges to be EFI_MEMORY_SP. For a DIMM, it could be a > >>>> property, similarly maybe for CXL- mem devices (no expert on that). > >>>> > >>>> For boot memory / machine memory it could be a machine property. > >>>> But I'll first have to learn which ranges you actually want to > >>>> expose that way, and what the VM will do with that information. > >>> [Luo, Zhigang] we want to expose the SPM memory reserved for specific > device. > >> And we will pass the SPM memory and the device to guest. Then the > >> device driver can use the SPM memory in guest side. > >> > >> Then the device driver should likely have a way to configure that, > >> not the memory backend. > >> > >> After all, the device driver will map it somehow into guest physical > >> address space (how?). > >> > > [Luo, Zhigang] from guest view, it's still system memory, but marked as > > SPM. So, > qemu will map the memory to guest physical address space. > > The device driver just claims to use the SPM memory in guest side. > > > >>> > >>>> > >>>>> > >>>>> Do you have any suggestions to achieve this more reasonable? > >>>> > >>>> The problem with qemu_ram_foreach_block() is that you would > >>>> indicate also DIMMs, virtio-mem, ... and even RAMBlocks that are > >>>> not even used for backing anything to the VM as EFI_MEMORY_SP, which is > wrong. > >>> [Luo, Zhigang] qemu_ram_foreach_block() will list all memory block, > >>> but in > >> pc_update_hmem_memory(), only the memory block with "hmem" flag will > >> be updated to SPM memory. > >> > >> Yes, but imagine a user passing such a memory backend to a > >> DIMM/virtio-mem/boot memory etc. It will have very undesired side effects. > >> > > [Luo, Zhigang] the user should know what he/she is doing when he/she set > > the flag > for the memory region. > > No, we must not allow to create insane configurations that don't make any > sense. > > Sufficient to add: > > -object memory-backend-file,size=16G,id=unused,mem-path=whatever,hmem=on > > to the cmdline to cause a mess. > > > Maybe it should be a "numa" node configuration like > > -numa node,nodeid=1,memdev=m1,sp=on > > But I recall that we discussed something related with Jonathan, so I'm hoping > we > can get his input. > > -- > Cheers, > > David / dhildenb