On 4/9/2025 5:57 PM, Alexey Kardashevskiy wrote:
>
>
> On 7/4/25 17:49, Chenyi Qiang wrote:
>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>> discard") highlighted that subsystems like VFIO may disable RAM block
>> discard. However, guest_memfd relies on discard operations for page
>> conversion between private and shared memory, potentially leading to
>> stale IOMMU mapping issue when assigning hardware devices to
>> confidential VMs via shared memory. To address this, it is crucial to
>> ensure systems like VFIO refresh its IOMMU mappings.
>>
>> PrivateSharedManager is introduced to manage private and shared states in
>> confidential VMs, similar to RamDiscardManager, which supports
>> coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
>> guest_memfd can facilitate the adjustment of VFIO mappings in response
>> to page conversion events.
>>
>> Since guest_memfd is not an object, it cannot directly implement the
>> PrivateSharedManager interface. Implementing it in HostMemoryBackend is
>> not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
>> have a memory backend while others do not.
>
> HostMemoryBackend::mr::ram_block::guest_memfd?
> And there is HostMemoryBackendMemfd too.
HostMemoryBackend is the parent of HostMemoryBackendMemfd. It is also
possible to use HostMemoryBackendFile or HostMemoryBackendRAM.
>
>> Notably, virtual BIOS
>> RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
>> backend.
>
> I thought private memory can be allocated from guest_memfd only. And it
> is still not clear if this BIOS memory can be discarded or not, does it
> change state during the VM lifetime?
> (sorry I keep asking but I do not remember definitive answer).
The BIOS region supports conversion as it is backed by guest_memfd. It
can change the state but it never does during VM lifetime.
>
>> To manage RAMBlocks with guest_memfd, define a new object named
>> RamBlockAttribute to implement the RamDiscardManager interface. This
>> object stores guest_memfd information such as shared_bitmap, and handles
>> page conversion notification. The memory state is tracked at the host
>> page size granularity, as the minimum memory conversion size can be one
>> page per request. Additionally, VFIO expects the DMA mapping for a
>> specific iova to be mapped and unmapped with the same granularity.
>> Confidential VMs may perform partial conversions, such as conversions on
>> small regions within larger regions. To prevent invalid cases and until
>> cut_mapping operation support is available, all operations are performed
>> with 4K granularity.
>>
>> Signed-off-by: Chenyi Qiang <chenyi.qi...@intel.com>
>> ---
>> Changes in v4:
>> - Change the name from memory-attribute-manager to
>> ram-block-attribute.
>> - Implement the newly-introduced PrivateSharedManager instead of
>> RamDiscardManager and change related commit message.
>> - Define the new object in ramblock.h instead of adding a new file.
>>
>> Changes in v3:
>> - Some rename (bitmap_size->shared_bitmap_size,
>> first_one/zero_bit->first_bit, etc.)
>> - Change shared_bitmap_size from uint32_t to unsigned
>> - Return mgr->mr->ram_block->page_size in get_block_size()
>> - Move set_ram_discard_manager() up to avoid a g_free() in failure
>> case.
>> - Add const for the memory_attribute_manager_get_block_size()
>> - Unify the ReplayRamPopulate and ReplayRamDiscard and related
>> callback.
>>
>> Changes in v2:
>> - Rename the object name to MemoryAttributeManager
>> - Rename the bitmap to shared_bitmap to make it more clear.
>> - Remove block_size field and get it from a helper. In future, we
>> can get the page_size from RAMBlock if necessary.
>> - Remove the unncessary "struct" before GuestMemfdReplayData
>> - Remove the unncessary g_free() for the bitmap
>> - Add some error report when the callback failure for
>> populated/discarded section.
>> - Move the realize()/unrealize() definition to this patch.
>> ---
>> include/exec/ramblock.h | 24 +++
>> system/meson.build | 1 +
>> system/ram-block-attribute.c | 282 +++++++++++++++++++++++++++++++++++
>> 3 files changed, 307 insertions(+)
>> create mode 100644 system/ram-block-attribute.c
>>
>> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
>> index 0babd105c0..b8b5469db9 100644
>> --- a/include/exec/ramblock.h
>> +++ b/include/exec/ramblock.h
>> @@ -23,6 +23,10 @@
>> #include "cpu-common.h"
>> #include "qemu/rcu.h"
>> #include "exec/ramlist.h"
>> +#include "system/hostmem.h"
>> +
>> +#define TYPE_RAM_BLOCK_ATTRIBUTE "ram-block-attribute"
>> +OBJECT_DECLARE_TYPE(RamBlockAttribute, RamBlockAttributeClass,
>> RAM_BLOCK_ATTRIBUTE)
>> struct RAMBlock {
>> struct rcu_head rcu;
>> @@ -90,5 +94,25 @@ struct RAMBlock {
>> */
>> ram_addr_t postcopy_length;
>> };
>> +
>> +struct RamBlockAttribute {
>> + Object parent;
>> +
>> + MemoryRegion *mr;
>> +
>> + /* 1-setting of the bit represents the memory is populated
>> (shared) */
>
> It is either RamBlockShared, or it is a "generic" RamBlockAttribute
> implementing a bitmap with a bit per page and no special meaning
> (shared/private or discarded/populated). And if it is a generic
> RamBlockAttribute, then this hunk from 09/13 (which should be in this
> patch) should look like:
>
>
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -46,6 +46,7 @@ struct RAMBlock {
> int fd;
> uint64_t fd_offset;
> int guest_memfd;
> + RamBlockAttribute *ram_shared; // and not "ram_block_attribute"
>
> Thanks,
I prefer generic RamBlockAttribute as we can extend to manage
private/shared/discarded states in the future if necessary. With the
same reason, I hope to keep the variable name of ram_block_attribute.
>
>
>> + unsigned shared_bitmap_size;
>> + unsigned long *shared_bitmap;
>> +
>> + QLIST_HEAD(, PrivateSharedListener) psl_list;
>> +};
>> +
>> +struct RamBlockAttributeClass {
>> + ObjectClass parent_class;
>> +};
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr);
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr);
>> +
>> #endif
>> #endif
>> diff --git a/system/meson.build b/system/meson.build
>> index 4952f4b2c7..50a5a64f1c 100644
>> --- a/system/meson.build
>> +++ b/system/meson.build
>> @@ -15,6 +15,7 @@ system_ss.add(files(
>> 'dirtylimit.c',
>> 'dma-helpers.c',
>> 'globals.c',
>> + 'ram-block-attribute.c',
>> 'memory_mapping.c',
>> 'qdev-monitor.c',
>> 'qtest.c',
>> diff --git a/system/ram-block-attribute.c b/system/ram-block-attribute.c
>> new file mode 100644
>> index 0000000000..283c03b354
>> --- /dev/null
>> +++ b/system/ram-block-attribute.c
>> @@ -0,0 +1,282 @@
>> +/*
>> + * QEMU ram block attribute
>> + *
>> + * Copyright Intel
>> + *
>> + * Author:
>> + * Chenyi Qiang <chenyi.qi...@intel.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "exec/ramblock.h"
>> +
>> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(RamBlockAttribute,
>> + ram_block_attribute,
>> + RAM_BLOCK_ATTRIBUTE,
>> + OBJECT,
>> + { TYPE_PRIVATE_SHARED_MANAGER },
>> + { })
>> +
>> +static size_t ram_block_attribute_get_block_size(const
>> RamBlockAttribute *attr)
>> +{
>> + /*
>> + * Because page conversion could be manipulated in the size of at
>> least 4K or 4K aligned,
>> + * Use the host page size as the granularity to track the memory
>> attribute.
>> + */
>> + g_assert(attr && attr->mr && attr->mr->ram_block);
>> + g_assert(attr->mr->ram_block->page_size ==
>> qemu_real_host_page_size());
>> + return attr->mr->ram_block->page_size;
>> +}
>> +
>> +
>> +static bool ram_block_attribute_psm_is_shared(const
>> GenericStateManager *gsm,
>> + const
>> MemoryRegionSection *section)
>> +{
>> + const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> + const int block_size = ram_block_attribute_get_block_size(attr);
>> + uint64_t first_bit = section->offset_within_region / block_size;
>> + uint64_t last_bit = first_bit + int128_get64(section->size) /
>> block_size - 1;
>> + unsigned long first_discard_bit;
>> +
>> + first_discard_bit = find_next_zero_bit(attr->shared_bitmap,
>> last_bit + 1, first_bit);
>> + return first_discard_bit > last_bit;
>> +}
>> +
>> +typedef int (*ram_block_attribute_section_cb)(MemoryRegionSection *s,
>> void *arg);
>> +
>> +static int ram_block_attribute_notify_shared_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> + StateChangeListener *scl = arg;
>> +
>> + return scl->notify_to_state_set(scl, section);
>> +}
>> +
>> +static int ram_block_attribute_notify_private_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> + StateChangeListener *scl = arg;
>> +
>> + scl->notify_to_state_clear(scl, section);
>> + return 0;
>> +}
>> +
>> +static int ram_block_attribute_for_each_shared_section(const
>> RamBlockAttribute *attr,
>> +
>> MemoryRegionSection *section,
>> + void *arg,
>> +
>> ram_block_attribute_section_cb cb)
>> +{
>> + unsigned long first_bit, last_bit;
>> + uint64_t offset, size;
>> + const int block_size = ram_block_attribute_get_block_size(attr);
>> + int ret = 0;
>> +
>> + first_bit = section->offset_within_region / block_size;
>> + first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size, first_bit);
>> +
>> + while (first_bit < attr->shared_bitmap_size) {
>> + MemoryRegionSection tmp = *section;
>> +
>> + offset = first_bit * block_size;
>> + last_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> + first_bit + 1) - 1;
>> + size = (last_bit - first_bit + 1) * block_size;
>> +
>> + if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> + break;
>> + }
>> +
>> + ret = cb(&tmp, arg);
>> + if (ret) {
>> + error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> + strerror(-ret));
>> + break;
>> + }
>> +
>> + first_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> + last_bit + 2);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int ram_block_attribute_for_each_private_section(const
>> RamBlockAttribute *attr,
>> +
>> MemoryRegionSection *section,
>> + void *arg,
>> +
>> ram_block_attribute_section_cb cb)
>> +{
>> + unsigned long first_bit, last_bit;
>> + uint64_t offset, size;
>> + const int block_size = ram_block_attribute_get_block_size(attr);
>> + int ret = 0;
>> +
>> + first_bit = section->offset_within_region / block_size;
>> + first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> + first_bit);
>> +
>> + while (first_bit < attr->shared_bitmap_size) {
>> + MemoryRegionSection tmp = *section;
>> +
>> + offset = first_bit * block_size;
>> + last_bit = find_next_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> + first_bit + 1) - 1;
>> + size = (last_bit - first_bit + 1) * block_size;
>> +
>> + if (!memory_region_section_intersect_range(&tmp, offset,
>> size)) {
>> + break;
>> + }
>> +
>> + ret = cb(&tmp, arg);
>> + if (ret) {
>> + error_report("%s: Failed to notify RAM discard listener:
>> %s", __func__,
>> + strerror(-ret));
>> + break;
>> + }
>> +
>> + first_bit = find_next_zero_bit(attr->shared_bitmap, attr-
>> >shared_bitmap_size,
>> + last_bit + 2);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static uint64_t ram_block_attribute_psm_get_min_granularity(const
>> GenericStateManager *gsm,
>> + const
>> MemoryRegion *mr)
>> +{
>> + const RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> +
>> + g_assert(mr == attr->mr);
>> + return ram_block_attribute_get_block_size(attr);
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_register_listener(GenericStateManager *gsm,
>> +
>> StateChangeListener *scl,
>> +
>> MemoryRegionSection *section)
>> +{
>> + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> + PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> + int ret;
>> +
>> + g_assert(section->mr == attr->mr);
>> + scl->section = memory_region_section_new_copy(section);
>> +
>> + QLIST_INSERT_HEAD(&attr->psl_list, psl, next);
>> +
>> + ret = ram_block_attribute_for_each_shared_section(attr, section,
>> scl,
>> +
>> ram_block_attribute_notify_shared_cb);
>> + if (ret) {
>> + error_report("%s: Failed to register RAM discard listener:
>> %s", __func__,
>> + strerror(-ret));
>> + }
>> +}
>> +
>> +static void
>> ram_block_attribute_psm_unregister_listener(GenericStateManager *gsm,
>> +
>> StateChangeListener *scl)
>> +{
>> + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> + PrivateSharedListener *psl = container_of(scl,
>> PrivateSharedListener, scl);
>> + int ret;
>> +
>> + g_assert(scl->section);
>> + g_assert(scl->section->mr == attr->mr);
>> +
>> + ret = ram_block_attribute_for_each_shared_section(attr, scl-
>> >section, scl,
>> +
>> ram_block_attribute_notify_private_cb);
>> + if (ret) {
>> + error_report("%s: Failed to unregister RAM discard listener:
>> %s", __func__,
>> + strerror(-ret));
>> + }
>> +
>> + memory_region_section_free_copy(scl->section);
>> + scl->section = NULL;
>> + QLIST_REMOVE(psl, next);
>> +}
>> +
>> +typedef struct RamBlockAttributeReplayData {
>> + ReplayStateChange fn;
>> + void *opaque;
>> +} RamBlockAttributeReplayData;
>> +
>> +static int ram_block_attribute_psm_replay_cb(MemoryRegionSection
>> *section, void *arg)
>> +{
>> + RamBlockAttributeReplayData *data = arg;
>> +
>> + return data->fn(section, data->opaque);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_shared(const
>> GenericStateManager *gsm,
>> +
>> MemoryRegionSection *section,
>> + ReplayStateChange
>> replay_fn,
>> + void *opaque)
>> +{
>> + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> + RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> + g_assert(section->mr == attr->mr);
>> + return ram_block_attribute_for_each_shared_section(attr, section,
>> &data,
>> +
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +static int ram_block_attribute_psm_replay_on_private(const
>> GenericStateManager *gsm,
>> +
>> MemoryRegionSection *section,
>> +
>> ReplayStateChange replay_fn,
>> + void *opaque)
>> +{
>> + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(gsm);
>> + RamBlockAttributeReplayData data = { .fn = replay_fn, .opaque =
>> opaque };
>> +
>> + g_assert(section->mr == attr->mr);
>> + return ram_block_attribute_for_each_private_section(attr,
>> section, &data,
>> +
>> ram_block_attribute_psm_replay_cb);
>> +}
>> +
>> +int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion
>> *mr)
>> +{
>> + uint64_t shared_bitmap_size;
>> + const int block_size = qemu_real_host_page_size();
>> + int ret;
>> +
>> + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
>> +
>> + attr->mr = mr;
>> + ret = memory_region_set_generic_state_manager(mr,
>> GENERIC_STATE_MANAGER(attr));
>> + if (ret) {
>> + return ret;
>> + }
>> + attr->shared_bitmap_size = shared_bitmap_size;
>> + attr->shared_bitmap = bitmap_new(shared_bitmap_size);
>> +
>> + return ret;
>> +}
>> +
>> +void ram_block_attribute_unrealize(RamBlockAttribute *attr)
>> +{
>> + g_free(attr->shared_bitmap);
>> + memory_region_set_generic_state_manager(attr->mr, NULL);
>> +}
>> +
>> +static void ram_block_attribute_init(Object *obj)
>> +{
>> + RamBlockAttribute *attr = RAM_BLOCK_ATTRIBUTE(obj);
>> +
>> + QLIST_INIT(&attr->psl_list);
>> +}
>> +
>> +static void ram_block_attribute_finalize(Object *obj)
>> +{
>> +}
>> +
>> +static void ram_block_attribute_class_init(ObjectClass *oc, void *data)
>> +{
>> + GenericStateManagerClass *gsmc = GENERIC_STATE_MANAGER_CLASS(oc);
>> +
>> + gsmc->get_min_granularity =
>> ram_block_attribute_psm_get_min_granularity;
>> + gsmc->register_listener = ram_block_attribute_psm_register_listener;
>> + gsmc->unregister_listener =
>> ram_block_attribute_psm_unregister_listener;
>> + gsmc->is_state_set = ram_block_attribute_psm_is_shared;
>> + gsmc->replay_on_state_set =
>> ram_block_attribute_psm_replay_on_shared;
>> + gsmc->replay_on_state_clear =
>> ram_block_attribute_psm_replay_on_private;
>> +}
>