On 4/9/2025 5:57 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this, it is crucial to
>> ensure systems like VFIO refresh its IOMMU mappings.
>>
>> PrivateSharedManager is introduced to manage private and shared states in
>> confidential VMs, similar to RamDiscardManager, which supports
>> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
>> guest_memfd can facilitate the adjustment of VFIO mappings in response
>> to page conversion events.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not. 
> 
> HostMemoryBackend::mr::ram_block::guest_memfd?
> And there is HostMemoryBackendMemfd too.

HostMemoryBackend is the parent of HostMemoryBackendMemfd. It is also
possible to use HostMemoryBackendFile or HostMemoryBackendRAM.

> 
>> Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
> 
> I thought private memory can be allocated from guest_memfd only. And it
> is still not clear if this BIOS memory can be discarded or not, does it
> change state during the VM lifetime?
> (sorry I keep asking but I do not remember definitive answer).

The BIOS region supports conversion as it is backed by guest_memfd. It
can change the state but it never does during VM lifetime.

> 
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttribute to implement the RamDiscardManager interface. This
>> object stores guest_memfd information such as shared_bitmap, and handles
>> page conversion notification. The memory state is tracked at the host
>> page size granularity, as the minimum memory conversion size can be one
>> page per request. Additionally, VFIO expects the DMA mapping for a
>> specific iova to be mapped and unmapped with the same granularity.
>> Confidential VMs may perform partial conversions, such as conversions on
>> small regions within larger regions. To prevent invalid cases and until
>> cut_mapping operation support is available, all operations are performed
>> with 4K granularity.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qi...@intel.com>
>> ---
>> Changes in v4:
>>      - Change the name from memory-attribute-manager to
>>        ram-block-attribute.
>>      - Implement the newly-introduced PrivateSharedManager instead of
>>        RamDiscardManager and change related commit message.
>>      - Define the new object in ramblock.h instead of adding a new file.
>>
>> Changes in v3:
>>      - Some rename (bitmap_size->shared_bitmap_size,
>>        first_one/zero_bit->first_bit, etc.)
>>      - Change shared_bitmap_size from uint32_t to unsigned
>>      - Return mgr->mr->ram_block->page_size in get_block_size()
>>      - Move set_ram_discard_manager() up to avoid a g_free() in failure
>>        case.
>>      - Add const for the memory_attribute_manager_get_block_size()
>>      - Unify the ReplayRamPopulate and ReplayRamDiscard and related
>>        callback.
>>
>> Changes in v2:
>>      - Rename the object name to MemoryAttributeManager
>>      - Rename the bitmap to shared_bitmap to make it more clear.
>>      - Remove block_size field and get it from a helper. In future, we
>>        can get the page_size from RAMBlock if necessary.
>>      - Remove the unncessary "struct" before GuestMemfdReplayData
>>      - Remove the unncessary g_free() for the bitmap
>>      - Add some error report when the callback failure for
>>        populated/discarded section.
>>      - Move the realize()/unrealize() definition to this patch.
>> ---
>>   include/exec/ramblock.h      |  24 +++
>>   system/meson.build           |   1 +
>>   system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>>   3 files changed, 307 insertions(+)
>>   create mode 100644 system/ram-block-attribute.c
>>
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 0babd105c0..b8b5469db9 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -23,6 +23,10 @@
>>   #include "cpu-common.h"
>>   #include "qemu/rcu.h"
>>   #include "exec/ramlist.h"
>> +#include "system/hostmem.h"
>> +
>> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
>> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass,
>> RAM_BLOCK_ATTRIBUTE)
>>     struct RAMBlock {
>>       struct rcu_head rcu;
>> @@ -90,5 +94,25 @@ struct RAMBlock {
>>        */
>>       ram_addr_t postcopy_length;
>>   };
>> +
>> +struct RamBlockAttribute {
>> +    Object parent;
>> +
>> +    MemoryRegion *mr;
>> +
>> +    /* 1-setting of the bit represents the memory is populated
>> (shared) */
> 
> It is either RamBlockShared, or it is a "generic" RamBlockAttribute
> implementing a bitmap with a bit per page and no special meaning
> (shared/private or discarded/populated). And if it is a generic
> RamBlockAttribute, then this hunk from 09/13 (which should be in this
> patch) should look like:
> 
> 
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -46,6 +46,7 @@ struct RAMBlock {
>      int fd;
>      uint64_t fd_offset;
>      int guest_memfd;
> +    RamBlockAttribute *ram_shared; // and not "ram_block_attribute"
> 
> Thanks,

I prefer generic RamBlockAttribute as we can extend to manage
private/shared/discarded states in the future if necessary. With the
same reason, I hope to keep the variable name of ram_block_attribute.

> 
> 
>> +    unsigned shared_bitmap_size;
>> +    unsigned long *shared_bitmap;
>> +
>> +    QLIST_HEAD(, PrivateSharedListener) psl_list;
>> +};
>> +
>> +struct RamBlockAttributeClass {
>> +    ObjectClass parent_class;
>> +};
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr);
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
>> +
>>   #endif
>>   #endif
>> diff --git a/system/meson.build b/system/meson.build
>> index 4952f4b2c7..50a5a64f1c 100644
>> --- a/system/meson.build
>> +++ b/system/meson.build
>> @@ -15,6 +15,7 @@ system_ss.add(files(
>>     'dirtylimit.c',
>>     'dma-helpers.c',
>>     'globals.c',
>> +  'ram-block-attribute.c',
>>     'memory_mapping.c',
>>     'qdev-monitor.c',
>>     'qtest.c',
>> diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
>> new file mode 100644
>> index 0000000000..283c03b354
>> --- /dev/null
>> +++ b/system/ram-block-attribute.c
>> @@ -0,0 +1,282 @@
>> +/*
>> + * QEMU ram block attribute
>> + *
>> + * Copyright Intel
>> + *
>> + * Author:
>> + *      Chenyi Qiang <chenyi.qi...@intel.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "exec/ramblock.h"
>> +
>> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
>> +                                   ram_block_attribute,
>> +                                   RAM_BLOCK_ATTRIBUTE,
>> +                                   OBJECT,
>> +                                   { TYPE_PRIVATE_SHARED_MANAGER },
>> +                                   { })
>> +
>> +static size_t ram_block_attribute_get_block_size(const
>> RamBlockAttribute *attr)
>> +{
>> +    /*
>> +     * Because page conversion could be manipulated in the size of at
>> least 4K or 4K aligned,
>> +     * Use the host page size as the granularity to track the memory
>> attribute.
>> +     */
>> +    g_assert(attr && attr->mr && attr->mr->ram_block);
>> +    g_assert(attr->mr->ram_block->page_size ==
>> qemu_real_host_page_size());
>> +    return attr->mr->ram_block->page_size;
>> +}
>> +
>> +
>> +static bool ram_block_attribute_psm_is_shared(const
>> GenericStateManager *gsm,
>> +                                              const
>> MemoryRegionSection *section)
>> +{
>> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    uint64_t first_bit = section->offset_within_region / block_size;
>> +    uint64_t last_bit = first_bit + int128_get64(section->size) /
>> block_size - 1;
>> +    unsigned long first_discard_bit;
>> +
>> +    first_discard_bit = find_next_zero_bit(attr->shared_bitmap,
>> last_bit + 1, first_bit);
>> +    return first_discard_bit > last_bit;
>> +}
>> +
>> +typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s,
>> void *arg);
>> +
>> +static int ram_block_attribute_notify_shared_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    StateChangeListener *scl = arg;
>> +
>> +    return scl->notify_to_state_set(scl, section);
>> +}
>> +
>> +static int ram_block_attribute_notify_private_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    StateChangeListener *scl = arg;
>> +
>> +    scl->notify_to_state_clear(scl, section);
>> +    return 0;
>> +}
>> +
>> +static int ram_block_attribute_for_each_shared_section(const
>> RamBlockAttribute *attr,
>> +                                                      
>> MemoryRegionSection *section,
>> +                                                       void *arg,
>> +                                                      
>> ram_block_attribute_section_cb cb)
>> +{
>> +    unsigned long first_bit, last_bit;
>> +    uint64_t offset, size;
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    int ret = 0;
>> +
>> +    first_bit = section->offset_within_region / block_size;
>> +    first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size, first_bit);
>> +
>> +    while (first_bit < attr->shared_bitmap_size) {
>> +        MemoryRegionSection tmp = *section;
>> +
>> +        offset = first_bit * block_size;
>> +        last_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                      first_bit + 1) - 1;
>> +        size = (last_bit - first_bit + 1) * block_size;
>> +
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> +            break;
>> +        }
>> +
>> +        ret = cb(&tmp, arg);
>> +        if (ret) {
>> +            error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> +                         strerror(-ret));
>> +            break;
>> +        }
>> +
>> +        first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                  last_bit + 2);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static int ram_block_attribute_for_each_private_section(const
>> RamBlockAttribute *attr,
>> +                                                       
>> MemoryRegionSection *section,
>> +                                                        void *arg,
>> +                                                       
>> ram_block_attribute_section_cb cb)
>> +{
>> +    unsigned long first_bit, last_bit;
>> +    uint64_t offset, size;
>> +    const int block_size = ram_block_attribute_get_block_size(attr);
>> +    int ret = 0;
>> +
>> +    first_bit = section->offset_within_region / block_size;
>> +    first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                   first_bit);
>> +
>> +    while (first_bit < attr->shared_bitmap_size) {
>> +        MemoryRegionSection tmp = *section;
>> +
>> +        offset = first_bit * block_size;
>> +        last_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                      first_bit + 1) - 1;
>> +        size = (last_bit - first_bit + 1) * block_size;
>> +
>> +        if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> +            break;
>> +        }
>> +
>> +        ret = cb(&tmp, arg);
>> +        if (ret) {
>> +            error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> +                         strerror(-ret));
>> +            break;
>> +        }
>> +
>> +        first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> +                                       last_bit + 2);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static uint64_t ram_block_attribute_psm_get_min_granularity(const
>> GenericStateManager *gsm,
>> +                                                            const
>> MemoryRegion *mr)
>> +{
>> +    const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +
>> +    g_assert(mr == attr->mr);
>> +    return ram_block_attribute_get_block_size(attr);
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
>> +                                                     
>> StateChangeListener *scl,
>> +                                                     
>> MemoryRegionSection *section)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    scl->section = memory_region_section_new_copy(section);
>> +
>> +    QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, section,
>> scl,
>> +                                                     
>> ram_block_attribute_notify_shared_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to register RAM discard listener:
>> %s", __func__,
>> +                     strerror(-ret));
>> +    }
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
>> +                                                       
>> StateChangeListener *scl)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> +    int ret;
>> +
>> +    g_assert(scl->section);
>> +    g_assert(scl->section->mr == attr->mr);
>> +
>> +    ret = ram_block_attribute_for_each_shared_section(attr, scl-
>> >section, scl,
>> +                                                     
>> ram_block_attribute_notify_private_cb);
>> +    if (ret) {
>> +        error_report("%s: Failed to unregister RAM discard listener:
>> %s", __func__,
>> +                     strerror(-ret));
>> +    }
>> +
>> +    memory_region_section_free_copy(scl->section);
>> +    scl->section = NULL;
>> +    QLIST_REMOVE(psl, next);
>> +}
>> +
>> +typedef struct RamBlockAttributeReplayData {
>> +    ReplayStateChange fn;
>> +    void *opaque;
>> +} RamBlockAttributeReplayData;
>> +
>> +static int ram_block_attribute_psm_replay_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> +    RamBlockAttributeReplayData *data = arg;
>> +
>> +    return data->fn(section, data->opaque);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_shared(const
>> GenericStateManager *gsm,
>> +                                                   
>> MemoryRegionSection *section,
>> +                                                    ReplayStateChange
>> replay_fn,
>> +                                                    void *opaque)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    return ram_block_attribute_for_each_shared_section(attr, section,
>> &data,
>> +                                                      
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_private(const
>> GenericStateManager *gsm,
>> +                                                    
>> MemoryRegionSection *section,
>> +                                                    
>> ReplayStateChange replay_fn,
>> +                                                     void *opaque)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +    RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> +    g_assert(section->mr == attr->mr);
>> +    return ram_block_attribute_for_each_private_section(attr,
>> section, &data,
>> +                                                       
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr)
>> +{
>> +    uint64_t shared_bitmap_size;
>> +    const int block_size  = qemu_real_host_page_size();
>> +    int ret;
>> +
>> +    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>> +
>> +    attr->mr = mr;
>> +    ret = memory_region_set_generic_state_manager(mr,
>> GENERIC_STATE_MANAGER(attr));
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +    attr->shared_bitmap_size = shared_bitmap_size;
>> +    attr->shared_bitmap = bitmap_new(shared_bitmap_size);
>> +
>> +    return ret;
>> +}
>> +
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
>> +{
>> +    g_free(attr->shared_bitmap);
>> +    memory_region_set_generic_state_manager(attr->mr, NULL);
>> +}
>> +
>> +static void ram_block_attribute_init(Object *obj)
>> +{
>> +    RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
>> +
>> +    QLIST_INIT(&attr->psl_list);
>> +}
>> +
>> +static void ram_block_attribute_finalize(Object *obj)
>> +{
>> +}
>> +
>> +static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
>> +{
>> +    GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
>> +
>> +    gsmc->get_min_granularity =
>> ram_block_attribute_psm_get_min_granularity;
>> +    gsmc->register_listener = ram_block_attribute_psm_register_listener;
>> +    gsmc->unregister_listener =
>> ram_block_attribute_psm_unregister_listener;
>> +    gsmc->is_state_set = ram_block_attribute_psm_is_shared;
>> +    gsmc->replay_on_state_set =
>> ram_block_attribute_psm_replay_on_shared;
>> +    gsmc->replay_on_state_clear =
>> ram_block_attribute_psm_replay_on_private;
>> +}
> 


Reply via email to