Thanks Baolu for your review!

On 5/9/2025 2:41 PM, Baolu Lu wrote:
> On 4/7/25 15:49, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this, it is crucial to
>> ensure systems like VFIO refresh its IOMMU mappings.
>>
>> PrivateSharedManager is introduced to manage private and shared states in
>> confidential VMs, similar to RamDiscardManager, which supports
>> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
>> guest_memfd can facilitate the adjustment of VFIO mappings in response
>> to page conversion events.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not. Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
>>
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttribute to implement the RamDiscardManager interface. This
>> object stores guest_memfd information such as shared_bitmap, and handles
>> page conversion notification. The memory state is tracked at the host
>> page size granularity, as the minimum memory conversion size can be one
>> page per request. Additionally, VFIO expects the DMA mapping for a
>> specific iova to be mapped and unmapped with the same granularity.
>> Confidential VMs may perform partial conversions, such as conversions on
>> small regions within larger regions. To prevent invalid cases and until
>> cut_mapping operation support is available, all operations are performed
>> with 4K granularity.
> 
> Just for your information, IOMMUFD plans to introduce the support for
> cut operation. The kickoff patch series is under discussion here:
> 
> https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b-
> iommu_pt_...@nvidia.com/

Thanks for this info. Just find the new version comes out.

> 
> This new cut support is expected to be exclusive to IOMMUFD and not
> directly available in the VFIO container context. The VFIO uAPI for map/
> unmap is being superseded by IOMMUFD, and all new features will only be
> available in IOMMUFD.

Yeah. I would suggest the test step to use iommufd in my cover letter
since this is the direction.

> 
>>
>> Signed-off-by: Chenyi Qiang<chenyi.qi...@intel.com>
> 
> <...>
> 
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr)
>> +{
>> +    uint64_t shared_bitmap_size;
>> +    const int block_size  = qemu_real_host_page_size();
>> +    int ret;
>> +
>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>> +
>> +    attr->mr = mr;
>> +    ret = memory_region_set_generic_state_manager(mr,
>> GENERIC_STATE_MANAGER(attr));
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +    attr->shared_bitmap_size = shared_bitmap_size;
>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
> 
> Above introduces a bitmap to track the private/shared state of each 4KB
> page. While functional, for large RAM blocks managed by guest_memfd,
> this could lead to significant memory consumption.
> 
> Have you considered an alternative like a Maple Tree or a generic
> interval tree? Both are often more memory-efficient for tracking ranges
> of contiguous states.

Maybe not necessary. The memory overhead is 1 bit per page
(1/(4096*8)=0.003%). I think it is not too much.

> 
>> +
>> +    return ret;
>> +}
>> +
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
>> +{
>> +    g_free(attr->shared_bitmap);
>> +    memory_region_set_generic_state_manager(attr->mr, NULL);
>> +}
> 
> Thanks,
> baolu


Reply via email to