Thanks Baolu for your review! On 5/9/2025 2:41 PM, Baolu Lu wrote: > On 4/7/25 15:49, Chenyi Qiang wrote: >> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated >> discard") highlighted that subsystems like VFIO may disable RAM block >> discard. However, guest_memfd relies on discard operations for page >> conversion between private and shared memory, potentially leading to >> stale IOMMU mapping issue when assigning hardware devices to >> confidential VMs via shared memory. To address this, it is crucial to >> ensure systems like VFIO refresh its IOMMU mappings. >> >> PrivateSharedManager is introduced to manage private and shared states in >> confidential VMs, similar to RamDiscardManager, which supports >> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with >> guest_memfd can facilitate the adjustment of VFIO mappings in response >> to page conversion events. >> >> Since guest_memfd is not an object, it cannot directly implement the >> PrivateSharedManager interface. Implementing it in HostMemoryBackend is >> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks >> have a memory backend while others do not. Notably, virtual BIOS >> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a >> backend. >> >> To manage RAMBlocks with guest_memfd, define a new object named >> RamBlockAttribute to implement the RamDiscardManager interface. This >> object stores guest_memfd information such as shared_bitmap, and handles >> page conversion notification. The memory state is tracked at the host >> page size granularity, as the minimum memory conversion size can be one >> page per request. Additionally, VFIO expects the DMA mapping for a >> specific iova to be mapped and unmapped with the same granularity. >> Confidential VMs may perform partial conversions, such as conversions on >> small regions within larger regions. To prevent invalid cases and until >> cut_mapping operation support is available, all operations are performed >> with 4K granularity. > > Just for your information, IOMMUFD plans to introduce the support for > cut operation. The kickoff patch series is under discussion here: > > https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b- > iommu_pt_...@nvidia.com/
Thanks for this info. Just find the new version comes out. > > This new cut support is expected to be exclusive to IOMMUFD and not > directly available in the VFIO container context. The VFIO uAPI for map/ > unmap is being superseded by IOMMUFD, and all new features will only be > available in IOMMUFD. Yeah. I would suggest the test step to use iommufd in my cover letter since this is the direction. > >> >> Signed-off-by: Chenyi Qiang<chenyi.qi...@intel.com> > > <...> > >> + >> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion >> *mr) >> +{ >> + uint64_t shared_bitmap_size; >> + const int block_size = qemu_real_host_page_size(); >> + int ret; >> + >> + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; >> + >> + attr->mr = mr; >> + ret = memory_region_set_generic_state_manager(mr, >> GENERIC_STATE_MANAGER(attr)); >> + if (ret) { >> + return ret; >> + } >> + attr->shared_bitmap_size = shared_bitmap_size; >> + attr->shared_bitmap = bitmap_new(shared_bitmap_size); > > Above introduces a bitmap to track the private/shared state of each 4KB > page. While functional, for large RAM blocks managed by guest_memfd, > this could lead to significant memory consumption. > > Have you considered an alternative like a Maple Tree or a generic > interval tree? Both are often more memory-efficient for tracking ranges > of contiguous states. Maybe not necessary. The memory overhead is 1 bit per page (1/(4096*8)=0.003%). I think it is not too much. > >> + >> + return ret; >> +} >> + >> +void ram_block_attribute_unrealize(RamBlockAttribute *attr) >> +{ >> + g_free(attr->shared_bitmap); >> + memory_region_set_generic_state_manager(attr->mr, NULL); >> +} > > Thanks, > baolu