On 2/21/2025 3:39 AM, David Hildenbrand wrote:
> On 20.02.25 17:13, Jean-Philippe Brucker wrote:
>> For Arm CCA we'd like the guest_memfd discard notifier to call the IOMMU
>> notifiers and create e.g. VFIO mappings. The default VFIO discard
>> notifier isn't sufficient for CCA because the DMA addresses need a
>> translation (even without vIOMMU).
>>
>> At the moment:
>> * guest_memfd_state_change() calls the populate() notifier
>> * the populate notifier() calls IOMMU notifiers
>> * the IOMMU notifier handler calls memory_get_xlat_addr() to get a VA
>> * it calls ram_discard_manager_is_populated() which fails.
>>
>> guest_memfd_state_change() only changes the section's state after
>> calling the populate() notifier. We can't easily invert the order of
>> operation because it uses the old state bitmap to know which pages need
>> the populate() notifier.
> 
> I assume we talk about this code: [1]
> 
> [1] https://lkml.kernel.org/r/20250217081833.21568-1-chenyi.qi...@intel.com
> 
> 
> +static int memory_attribute_state_change(MemoryAttributeManager *mgr,
> uint64_t offset,
> +                                         uint64_t size, bool
> shared_to_private)
> +{
> +    int block_size = memory_attribute_manager_get_block_size(mgr);
> +    int ret = 0;
> +
> +    if (!memory_attribute_is_valid_range(mgr, offset, size)) {
> +        error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
> +                     __func__, offset, size);
> +        return -1;
> +    }
> +
> +    if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
> offset, size)) ||
> +        (!shared_to_private && memory_attribute_is_range_populated(mgr,
> offset, size))) {
> +        return 0;
> +    }
> +
> +    if (shared_to_private) {
> +        memory_attribute_notify_discard(mgr, offset, size);
> +    } else {
> +        ret = memory_attribute_notify_populate(mgr, offset, size);
> +    }
> +
> +    if (!ret) {
> +        unsigned long first_bit = offset / block_size;
> +        unsigned long nbits = size / block_size;
> +
> +        g_assert((first_bit + nbits) <= mgr->bitmap_size);
> +
> +        if (shared_to_private) {
> +            bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
> +        } else {
> +            bitmap_set(mgr->shared_bitmap, first_bit, nbits);
> +        }
> +
> +        return 0;
> +    }
> +
> +    return ret;
> +}
> 
> Then, in memory_attribute_notify_populate(), we walk the bitmap again.
> 
> Why?
> 
> We just checked that it's all in the expected state, no?
> 
> 
> virtio-mem doesn't handle it that way, so I'm curious why we would have
> to do it here?

I was concerned about the case where the guest issues a request that
only partial of the range is in the desired state.
I think the main problem is the policy for the guest conversion request.
My current handling is:

1. When a conversion request is made for a range already in the desired
  state, the helper simply returns success.
2. For requests involving a range partially in the desired state, only
  the necessary segments are converted, ensuring the entire range
  complies with the request efficiently.
3. In scenarios where a conversion request is declined by other systems,
  such as a failure from VFIO during notify_populate(), the helper will
  roll back the request, maintaining consistency.

And the policy of virtio-mem is to refuse the state change if not all
blocks are in the opposite state.

Actually, this part is still a uncertain to me.

BTW, per the status/bitmap track, the virtio-mem also changes the bitmap
after the plug/unplug notifier. This is the same, correct?

> 
> 
>>
>> For now add a flag to the IOMMU notifier to tell memory_get_xlat_addr()
>> that we're aware of the RAM discard manager state.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-phili...@linaro.org>
>> ---
>>
>> Definitely not the prettiest hack, any idea how to do this cleanly?
>> ---
>>   include/exec/memory.h | 5 +++++
>>   system/memory.c       | 3 ++-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 9f73b59867..6fcd98fe58 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -116,6 +116,11 @@ typedef enum {
>>       IOMMU_RO   = 1,
>>       IOMMU_WO   = 2,
>>       IOMMU_RW   = 3,
>> +    /*
>> +     * Allow mapping a discarded page, because we're in the process of
>> +     * populating it.
>> +     */
>> +    IOMMU_POPULATING = 4,
>>   } IOMMUAccessFlags;
>>     #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ?
>> IOMMU_WO : 0))
>> diff --git a/system/memory.c b/system/memory.c
>> index 4c829793a0..8e884f9c15 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -2221,7 +2221,8 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb,
>> void **vaddr,
>>            * Disallow that. vmstate priorities make sure any
>> RamDiscardManager
>>            * were already restored before IOMMUs are restored.
>>            */
>> -        if (!ram_discard_manager_is_populated(rdm, &tmp)) {
>> +        if (!(iotlb->perm & IOMMU_POPULATING) &&
>> +            !ram_discard_manager_is_populated(rdm, &tmp)) {
>>               error_setg(errp, "iommu map to discarded memory (e.g.,
>> unplugged"
>>                            " via virtio-mem): %" HWADDR_PRIx "",
>>                            iotlb->translated_addr);
> 
> 


Reply via email to