On 21.02.25 03:25, Chenyi Qiang wrote:
On 2/21/2025 3:39 AM, David Hildenbrand wrote:
On 20.02.25 17:13, Jean-Philippe Brucker wrote:
For Arm CCA we'd like the guest_memfd discard notifier to call the IOMMU
notifiers and create e.g. VFIO mappings. The default VFIO discard
notifier isn't sufficient for CCA because the DMA addresses need a
translation (even without vIOMMU).
At the moment:
* guest_memfd_state_change() calls the populate() notifier
* the populate notifier() calls IOMMU notifiers
* the IOMMU notifier handler calls memory_get_xlat_addr() to get a VA
* it calls ram_discard_manager_is_populated() which fails.
guest_memfd_state_change() only changes the section's state after
calling the populate() notifier. We can't easily invert the order of
operation because it uses the old state bitmap to know which pages need
the populate() notifier.
I assume we talk about this code: [1]
[1] https://lkml.kernel.org/r/20250217081833.21568-1-chenyi.qi...@intel.com
+static int memory_attribute_state_change(MemoryAttributeManager *mgr,
uint64_t offset,
+ uint64_t size, bool
shared_to_private)
+{
+ int block_size = memory_attribute_manager_get_block_size(mgr);
+ int ret = 0;
+
+ if (!memory_attribute_is_valid_range(mgr, offset, size)) {
+ error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
+ __func__, offset, size);
+ return -1;
+ }
+
+ if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
offset, size)) ||
+ (!shared_to_private && memory_attribute_is_range_populated(mgr,
offset, size))) {
+ return 0;
+ }
+
+ if (shared_to_private) {
+ memory_attribute_notify_discard(mgr, offset, size);
+ } else {
+ ret = memory_attribute_notify_populate(mgr, offset, size);
+ }
+
+ if (!ret) {
+ unsigned long first_bit = offset / block_size;
+ unsigned long nbits = size / block_size;
+
+ g_assert((first_bit + nbits) <= mgr->bitmap_size);
+
+ if (shared_to_private) {
+ bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
+ } else {
+ bitmap_set(mgr->shared_bitmap, first_bit, nbits);
+ }
+
+ return 0;
+ }
+
+ return ret;
+}
Then, in memory_attribute_notify_populate(), we walk the bitmap again.
Why?
We just checked that it's all in the expected state, no?
virtio-mem doesn't handle it that way, so I'm curious why we would have
to do it here?
I was concerned about the case where the guest issues a request that
only partial of the range is in the desired state.
I think the main problem is the policy for the guest conversion request.
My current handling is:
1. When a conversion request is made for a range already in the desired
state, the helper simply returns success.
Yes.
2. For requests involving a range partially in the desired state, only
the necessary segments are converted, ensuring the entire range
complies with the request efficiently.
Ah, now I get:
+ if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
offset, size)) ||
+ (!shared_to_private && memory_attribute_is_range_populated(mgr,
offset, size))) {
+ return 0;
+ }
+
We're not failing if it might already partially be in the other state.
3. In scenarios where a conversion request is declined by other systems,
such as a failure from VFIO during notify_populate(), the helper will
roll back the request, maintaining consistency.
And the policy of virtio-mem is to refuse the state change if not all
blocks are in the opposite state.
Yes.
Actually, this part is still a uncertain to me.
IIUC, the problem does not exist if we only convert a single page at a time.
Is there a known use case where such partial conversions could happen?
BTW, per the status/bitmap track, the virtio-mem also changes the bitmap
after the plug/unplug notifier. This is the same, correct?
Right. But because we reject these partial requests, we don't have to
traverse the bitmap and could just adjust the bitmap operations.
--
Cheers,
David / dhildenb