On 2/21/2025 3:39 AM, David Hildenbrand wrote:
> On 20.02.25 17:13, Jean-Philippe Brucker wrote:
>> For Arm CCA we'd like the guest_memfd discard notifier to call the IOMMU
>> notifiers and create e.g. VFIO mappings. The default VFIO discard
>> notifier isn't sufficient for CCA because the DMA addresses need a
>> translation (even without vIOMMU).
>>
>> At the moment:
>> * guest_memfd_state_change() calls the populate() notifier
>> * the populate notifier() calls IOMMU notifiers
>> * the IOMMU notifier handler calls memory_get_xlat_addr() to get a VA
>> * it calls ram_discard_manager_is_populated() which fails.
>>
>> guest_memfd_state_change() only changes the section's state after
>> calling the populate() notifier. We can't easily invert the order of
>> operation because it uses the old state bitmap to know which pages need
>> the populate() notifier.
>
> I assume we talk about this code: [1]
>
> [1] https://lkml.kernel.org/r/20250217081833.21568-1-chenyi.qi...@intel.com
>
>
> +static int memory_attribute_state_change(MemoryAttributeManager *mgr,
> uint64_t offset,
> + uint64_t size, bool
> shared_to_private)
> +{
> + int block_size = memory_attribute_manager_get_block_size(mgr);
> + int ret = 0;
> +
> + if (!memory_attribute_is_valid_range(mgr, offset, size)) {
> + error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
> + __func__, offset, size);
> + return -1;
> + }
> +
> + if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
> offset, size)) ||
> + (!shared_to_private && memory_attribute_is_range_populated(mgr,
> offset, size))) {
> + return 0;
> + }
> +
> + if (shared_to_private) {
> + memory_attribute_notify_discard(mgr, offset, size);
> + } else {
> + ret = memory_attribute_notify_populate(mgr, offset, size);
> + }
> +
> + if (!ret) {
> + unsigned long first_bit = offset / block_size;
> + unsigned long nbits = size / block_size;
> +
> + g_assert((first_bit + nbits) <= mgr->bitmap_size);
> +
> + if (shared_to_private) {
> + bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
> + } else {
> + bitmap_set(mgr->shared_bitmap, first_bit, nbits);
> + }
> +
> + return 0;
> + }
> +
> + return ret;
> +}
>
> Then, in memory_attribute_notify_populate(), we walk the bitmap again.
>
> Why?
>
> We just checked that it's all in the expected state, no?
>
>
> virtio-mem doesn't handle it that way, so I'm curious why we would have
> to do it here?
I was concerned about the case where the guest issues a request that
only partial of the range is in the desired state.
I think the main problem is the policy for the guest conversion request.
My current handling is:
1. When a conversion request is made for a range already in the desired
state, the helper simply returns success.
2. For requests involving a range partially in the desired state, only
the necessary segments are converted, ensuring the entire range
complies with the request efficiently.
3. In scenarios where a conversion request is declined by other systems,
such as a failure from VFIO during notify_populate(), the helper will
roll back the request, maintaining consistency.
And the policy of virtio-mem is to refuse the state change if not all
blocks are in the opposite state.
Actually, this part is still a uncertain to me.
BTW, per the status/bitmap track, the virtio-mem also changes the bitmap
after the plug/unplug notifier. This is the same, correct?
>
>
>>
>> For now add a flag to the IOMMU notifier to tell memory_get_xlat_addr()
>> that we're aware of the RAM discard manager state.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-phili...@linaro.org>
>> ---
>>
>> Definitely not the prettiest hack, any idea how to do this cleanly?
>> ---
>> include/exec/memory.h | 5 +++++
>> system/memory.c | 3 ++-
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 9f73b59867..6fcd98fe58 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -116,6 +116,11 @@ typedef enum {
>> IOMMU_RO = 1,
>> IOMMU_WO = 2,
>> IOMMU_RW = 3,
>> + /*
>> + * Allow mapping a discarded page, because we're in the process of
>> + * populating it.
>> + */
>> + IOMMU_POPULATING = 4,
>> } IOMMUAccessFlags;
>> #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ?
>> IOMMU_WO : 0))
>> diff --git a/system/memory.c b/system/memory.c
>> index 4c829793a0..8e884f9c15 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2221,7 +2221,8 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb,
>> void **vaddr,
>> * Disallow that. vmstate priorities make sure any
>> RamDiscardManager
>> * were already restored before IOMMUs are restored.
>> */
>> - if (!ram_discard_manager_is_populated(rdm, &tmp)) {
>> + if (!(iotlb->perm & IOMMU_POPULATING) &&
>> + !ram_discard_manager_is_populated(rdm, &tmp)) {
>> error_setg(errp, "iommu map to discarded memory (e.g.,
>> unplugged"
>> " via virtio-mem): %" HWADDR_PRIx "",
>> iotlb->translated_addr);
>
>