On 6/3/2025 5:10 AM, David Hildenbrand wrote:
> On 30.05.25 10:32, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> the stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this and allow shared
>> device assignement, it is crucial to ensure the VFIO system refreshes
>> its IOMMU mappings.
>>
>> RamDiscardManager is an existing interface (used by virtio-mem) to
>> adjust VFIO mappings in relation to VM page assignment. Effectively page
>> conversion is similar to hot-removing a page in one mode and adding it
>> back in the other. Therefore, similar actions are required for page
>> conversion events. Introduce the RamDiscardManager to guest_memfd to
>> facilitate this process.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> RamDiscardManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not. Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
>>
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttributes to implement the RamDiscardManager interface. This
>> object can store the guest_memfd information such as bitmap for shared
>> memory and the registered listeners for event notification. In the
>> context of RamDiscardManager, shared state is analogous to populated, and
>> private state is signified as discarded. To notify the conversion events,
>> a new state_change() helper is exported for the users to notify the
>> listeners like VFIO, so that VFIO can dynamically DMA map/unmap the
>> shared mapping.
>>
>> Note that the memory state is tracked at the host page size granularity,
>> as the minimum conversion size can be one page per request and VFIO
>> expects the DMA mapping for a specific iova to be mapped and unmapped
>> with the same granularity. Confidential VMs may perform partial
>> conversions, such as conversions on small regions within larger ones.
>> To prevent such invalid cases and until DMA mapping cut operation
>> support is available, all operations are performed with 4K granularity.
>>
>> In addition, memory conversion failures cause QEMU to quit instead of
>> resuming the guest or retrying the operation at present. It would be
>> future work to add more error handling or rollback mechanisms once
>> conversion failures are allowed. For example, in-place conversion of
>> guest_memfd could retry the unmap operation during the conversion from
>> shared to private. For now, keep the complex error handling out of the
>> picture as it is not required.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qi...@intel.com>
>> ---
>> Changes in v6:
>>      - Change the object type name from RamBlockAttribute to
>>        RamBlockAttributes. (David)
>>      - Save the associated RAMBlock instead MemoryRegion in
>>        RamBlockAttributes. (David)
>>      - Squash the state_change() helper introduction in this commit as
>>        well as the mixture conversion case handling. (David)
>>      - Change the block_size type from int to size_t and some cleanup in
>>        validation check. (Alexey)
>>      - Add a tracepoint to track the state changes. (Alexey)
>>
>> Changes in v5:
>>      - Revert to use RamDiscardManager interface instead of introducing
>>        new hierarchy of class to manage private/shared state, and keep
>>        using the new name of RamBlockAttribute compared with the
>>        MemoryAttributeManager in v3.
>>      - Use *simple* version of object_define and object_declare since the
>>        state_change() function is changed as an exported function instead
>>        of a virtual function in later patch.
>>      - Move the introduction of RamBlockAttribute field to this patch and
>>        rename it to ram_shared. (Alexey)
>>      - call the exit() when register/unregister failed. (Zhao)
>>      - Add the ram-block-attribute.c to Memory API related part in
>>        MAINTAINERS.
>>
>> Changes in v4:
>>      - Change the name from memory-attribute-manager to
>>        ram-block-attribute.
>>      - Implement the newly-introduced PrivateSharedManager instead of
>>        RamDiscardManager and change related commit message.
>>      - Define the new object in ramblock.h instead of adding a new file.
>> ---
>>   MAINTAINERS                   |   1 +
>>   include/system/ramblock.h     |  21 ++
>>   system/meson.build            |   1 +
>>   system/ram-block-attributes.c | 480 ++++++++++++++++++++++++++++++++++
>>   system/trace-events           |   3 +
>>   5 files changed, 506 insertions(+)
>>   create mode 100644 system/ram-block-attributes.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 6dacd6d004..8ec39aa7f8 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3149,6 +3149,7 @@ F: system/memory.c
>>   F: system/memory_mapping.c
>>   F: system/physmem.c
>>   F: system/memory-internal.h
>> +F: system/ram-block-attributes.c
>>   F: scripts/coccinelle/memory-region-housekeeping.cocci
>>     Memory devices
>> diff --git a/include/system/ramblock.h b/include/system/ramblock.h
>> index d8a116ba99..1bab9e2dac 100644
>> --- a/include/system/ramblock.h
>> +++ b/include/system/ramblock.h
>> @@ -22,6 +22,10 @@
>>   #include "exec/cpu-common.h"
>>   #include "qemu/rcu.h"
>>   #include "exec/ramlist.h"
>> +#include "system/hostmem.h"
>> +
>> +#define TYPE_RAM_BLOCK_ATTRIBUTES "ram-block-attributes"
>> +OBJECT_DECLARE_SIMPLE_TYPE(RamBlockAttributes, RAM_BLOCK_ATTRIBUTES)
>>     struct RAMBlock {
>>       struct rcu_head rcu;
>> @@ -91,4 +95,21 @@ struct RAMBlock {
>>       ram_addr_t postcopy_length;
>>   };
>>   +struct RamBlockAttributes {
>> +    Object parent;
>> +
>> +    RAMBlock *ram_block;
>> +
>> +    /* 1-setting of the bitmap represents ram is populated (shared) */
>> +    unsigned bitmap_size;
>> +    unsigned long *bitmap;
> 
> So, initially, all memory starts out as private, correct?

Yes.

> 
> I guess this mimics what kvm_set_phys_mem() ends up doing, when it does
> the kvm_set_memory_attributes_private() call.
> 
> So there is a short period of inconsistency, between creating the
> RAMBlock and mapping it into the PA space.

I initially had such a patch [1] to describe the inconsistency in RFC
series. The bitmap was a 1-setting private bitmap at that time to keep
consistent with guest_memfd and it required an explicit bitmap_fill().

[1]
https://lore.kernel.org/qemu-devel/20240725072118.358923-6-chenyi.qi...@intel.com/

> 
> It might be wroth spelling that out / documenting it somewhere.

OK. I missed the above commit document after changing the bitmap to
shared. I can add it back if need a new version.

> 


Reply via email to