On 10.01.25 14:20, Jason Gunthorpe wrote:

Thanks for your reply, I knew CCing you would be very helpful :)

On Fri, Jan 10, 2025 at 09:26:02AM +0100, David Hildenbrand wrote:
One limitation (also discussed in the guest_memfd
meeting) is that VFIO expects the DMA mapping for
a specific IOVA to be mapped and unmapped with the
same granularity.

Not just same granularity, whatever you map you have to unmap in
whole. map/unmap must be perfectly paired by userspace.

Right, that's what virtio-mem ends up doing by mapping each memory block (e.g., 2 MiB) separately that could be unmapped separately.

It adds "overhead", but at least you don't run into "no, you cannot split this region because you would be out of memory/slots" or in the past issues with concurrent ongoing DMA.


such as converting a small region within a larger
region. To prevent such invalid cases, all
operations are performed with 4K granularity. The
possible solutions we can think of are either to
enable VFIO to support partial unmap

Yes, you can do that, but it is aweful for performance everywhere

Absolutely.


In your commit I read:

"Implement the cut operation to be hitless, changes to the page table
during cutting must cause zero disruption to any ongoing DMA. This is the expectation of the VFIO type 1 uAPI. Hitless requires HW support, it is incompatible with HW requiring break-before-make."

So I guess that would mean that, depending on HW support, one could avoid disabling large pages to still allow for atomic cuts / partial unmaps that don't affect concurrent DMA.


What would be your suggestion here to avoid the "map each 4k page individually so we can unmap it individually" ? I didn't completely grasp that, sorry.

From "IIRC you can only trigger split using the VFIO type 1 legacy API. We would need to formalize split as an IOMMUFD native ioctl.
Nobody should use this stuf through the legacy type 1 API!!!!"

I assume you mean that we can only avoid the 4k map/unmap if we add proper support to IOMMUFD native ioctl, and not try making it fly somehow with the legacy type 1 API?

--
Cheers,

David / dhildenb


Reply via email to