Re: [PATCH 1/4] qemu: Implement support for associating iommufd to hostdev

Laine Stump Fri, 21 Nov 2025 07:31:58 -0800

TL;DR of all my rambling below - I think just adding "iommfd='yes'"attribute (as a tristate like Jano suggested) is fine for now, and don'tthink we should do anything with the name attribute. More details belowif you're really in for a read :-)


On 11/6/25 7:29 PM, Nathan Chen via Devel wrote:



On 11/6/2025 10:49 AM, Ján Tomko wrote:

Implement a new iommufd attribute under hostdevs' PCI
subsystem driver that can be used to specify associated
iommufd object when launching a qemu VM.


This does not specify which iommufd object it is, just to use the
default one.

It's perfect for now, we might need a different element if using
anything else than iommufd0 starts making sense.

Yeah, I think earlier versions of the patches explicitly gave theiommufd object name used by each device (e.g. literally "iommufd0"), andwe deemed that "too much information", recommending to instead just say"use it" or "don't use it", and then later we can add an iommufdIndex orsomething that would default to 0, and then could contain other valuesif multiple iommufd objects were needed (and so, e.g., if two deviceshad "iommufd='yes' iommufdIndex='1'" then they would both be setup touse the same (non-default) iommufd (maybe "iommufd1").

So for right now while we're just supporting a single iommufd object perdomain, the current proposed XML should be fine.


Also, I think it should fine not to expose the object in the XML since
it has configurable attributes now:


I think you mean "*no* configurable attributes"?


# qemu-system-x86_64 -object iommufd,?
iommufd options:
   fd=<string>


Noted, will re-visit if anything else other than iommufd0 makes sense.


Signed-off-by: Nathan Chen <[email protected]>
---
docs/formatdomain.rst           |  8 ++++++++
src/conf/device_conf.c          |  9 +++++++++
src/conf/device_conf.h          |  1 +
src/conf/schemas/basictypes.rng |  5 +++++
src/qemu/qemu_command.c         | 19 +++++++++++++++++++
5 files changed, 42 insertions(+)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 34dc9c3af7..a5c69dbcf4 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -4845,6 +4845,7 @@ or:

device; if PCI ROM loading is disabled through this attribute,attempts to tweak the loading process further using the ``bar`` or ``file``attributes

   will be rejected. :since:`Since 4.3.0 (QEMU and KVM only)`.
+
``address``
   The ``address`` element for USB devices has a ``bus`` and ``device``

attribute to specify the USB bus and device number the deviceappears at on

@@ -4885,6 +4886,13 @@ or:
   found is "problematic" in some way, the generic vfio-pci driver
   similarly be forced.

+   The ``<driver>`` element's ``iommufd`` attribute is used to specify
+   using the iommufd interface to propagate DMA mappings to the kernel,
+   instead of legacy VFIO. When the attribute is present, an iommufd
+   object will be created by the resulting qemu command. Libvirt will
+   open the /dev/iommu and VFIO device cdev, passing the associated
+   file descriptor numbers to the qemu command.
+


Should we resurrect the old attribute and use:
   <driver name="iommufd"/>

The idea being that later in time, when it will no longer make sense
to use "legacy" VFIO, we will retire it again.

My understanding is that this is still classified as "VFIO deviceassignment", but just using an iommufd for communication, so it's not"let's do this instead of VFIO", but "let's do VFIO *this* way insteadof the other way".

Meanwhile, you'd asked earlier about memories of the switch from "legacyKVM" device assignment to VFIO. One thing that's really important toknow about that change (when thinking about it as a model to follow forthis current change) is that during those days the presence of any PCIdevice assigned from the host to a guest would render the domainunmigrateable, and so when thinking about the transition of the defaultfrom one to the other we didn't need to consider the possibility ofmigrating a running guest from legacy KVM to VFIO - in order to switchfrom one to the other you had to shutdown and then restart the domain.

We also kept around the "<driver name='kvm'/> nearly a decade longerthan it was likely necessary - we only added the ability to manuallyselect at all because someone "closer to customers/users" had insistedwe needed a way to switch back to the "old way" if there was a bug inVFIO. But this was never needed - from the very beginning VFIO workedbetter than legacy KVM, and there were no "missing" features that wouldrequire someone to use legacy KVM assignment.

I recall removing at least part of the supporting code for legacy KVMassignment several years ago (pretty sure someone else removed the finalvestiges) and at the time thinking to myself "all this work that madethe code and the configuration more complicated, and made maintenancemore complicated and time consuming, only to *never* use it, and thenfinally remove it 10 years later. How depressing :-/".

Also, referring to it as "legacy" is both premature (since iommufddoes not
have the feature parity yet) and confusing in the passage of time.
I think it would be better to leave it as-is for now, since there arevariant VFIO drivers besides vfio-pci that could be assigned to thedriver name attribute in tandem with enabling iommufd.

Actually a vfio variant driver (other than the variant driver that isautomatically discovered as "most appropriate" for the device) isconfigured with <driver model='blah'/>, not name='blah'.

   (Note: :since:`Since 1.0.5`, the ``name`` attribute has been
   described to be used to select the type of PCI device assignment
   ("vfio", "kvm", or "xen"), but those values have been mostly
diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c
index c278b81652..88979ecc39 100644
--- a/src/conf/device_conf.c
+++ b/src/conf/device_conf.c
@@ -60,6 +60,8 @@ int
virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node,

virDeviceHostdevPCIDriverInfo*driver)

{
+    virTristateBool iommufd;
+    driver->iommufd = false;
    if (virXMLPropEnum(node, "name",
                       virDeviceHostdevPCIDriverNameTypeFromString,
                       VIR_XML_PROP_NONZERO,

@@ -67,6 +69,10 @@ virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtrnode,

        return -1;
    }

+ if (virXMLPropTristateBool(node, "iommufd", VIR_XML_PROP_NONE,&iommufd) < 0)

+        return -1;
+    virTristateBoolToBool(iommufd, &driver->iommufd);


Storing this as 'bool' is losing information. We need to be able to tell
whether iommufd was not used because the user did not specify it or
whether it was not used because the user explicitly said no for future
compatibility reasons.

+1

That makes sense, I will update it to use virTristateBool instead in thenext revision.
-Nathan

Re: [PATCH 1/4] qemu: Implement support for associating iommufd to hostdev

Reply via email to