TL;DR of all my rambling below - I think just adding "iommfd='yes'"
attribute (as a tristate like Jano suggested) is fine for now, and don't
think we should do anything with the name attribute. More details below
if you're really in for a read :-)
On 11/6/25 7:29 PM, Nathan Chen via Devel wrote:
On 11/6/2025 10:49 AM, Ján Tomko wrote:
Implement a new iommufd attribute under hostdevs' PCI
subsystem driver that can be used to specify associated
iommufd object when launching a qemu VM.
This does not specify which iommufd object it is, just to use the
default one.
It's perfect for now, we might need a different element if using
anything else than iommufd0 starts making sense.
Yeah, I think earlier versions of the patches explicitly gave the
iommufd object name used by each device (e.g. literally "iommufd0"), and
we deemed that "too much information", recommending to instead just say
"use it" or "don't use it", and then later we can add an iommufdIndex or
something that would default to 0, and then could contain other values
if multiple iommufd objects were needed (and so, e.g., if two devices
had "iommufd='yes' iommufdIndex='1'" then they would both be setup to
use the same (non-default) iommufd (maybe "iommufd1").
So for right now while we're just supporting a single iommufd object per
domain, the current proposed XML should be fine.
Also, I think it should fine not to expose the object in the XML since
it has configurable attributes now:
I think you mean "*no* configurable attributes"?
# qemu-system-x86_64 -object iommufd,?
iommufd options:
fd=<string>
Noted, will re-visit if anything else other than iommufd0 makes sense.
Signed-off-by: Nathan Chen <[email protected]>
---
docs/formatdomain.rst | 8 ++++++++
src/conf/device_conf.c | 9 +++++++++
src/conf/device_conf.h | 1 +
src/conf/schemas/basictypes.rng | 5 +++++
src/qemu/qemu_command.c | 19 +++++++++++++++++++
5 files changed, 42 insertions(+)
diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 34dc9c3af7..a5c69dbcf4 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -4845,6 +4845,7 @@ or:
device; if PCI ROM loading is disabled through this attribute,
attempts to
tweak the loading process further using the ``bar`` or ``file``
attributes
will be rejected. :since:`Since 4.3.0 (QEMU and KVM only)`.
+
``address``
The ``address`` element for USB devices has a ``bus`` and ``device``
attribute to specify the USB bus and device number the device
appears at on
@@ -4885,6 +4886,13 @@ or:
found is "problematic" in some way, the generic vfio-pci driver
similarly be forced.
+ The ``<driver>`` element's ``iommufd`` attribute is used to specify
+ using the iommufd interface to propagate DMA mappings to the kernel,
+ instead of legacy VFIO. When the attribute is present, an iommufd
+ object will be created by the resulting qemu command. Libvirt will
+ open the /dev/iommu and VFIO device cdev, passing the associated
+ file descriptor numbers to the qemu command.
+
Should we resurrect the old attribute and use:
<driver name="iommufd"/>
The idea being that later in time, when it will no longer make sense
to use "legacy" VFIO, we will retire it again.
My understanding is that this is still classified as "VFIO device
assignment", but just using an iommufd for communication, so it's not
"let's do this instead of VFIO", but "let's do VFIO *this* way instead
of the other way".
Meanwhile, you'd asked earlier about memories of the switch from "legacy
KVM" device assignment to VFIO. One thing that's really important to
know about that change (when thinking about it as a model to follow for
this current change) is that during those days the presence of any PCI
device assigned from the host to a guest would render the domain
unmigrateable, and so when thinking about the transition of the default
from one to the other we didn't need to consider the possibility of
migrating a running guest from legacy KVM to VFIO - in order to switch
from one to the other you had to shutdown and then restart the domain.
We also kept around the "<driver name='kvm'/> nearly a decade longer
than it was likely necessary - we only added the ability to manually
select at all because someone "closer to customers/users" had insisted
we needed a way to switch back to the "old way" if there was a bug in
VFIO. But this was never needed - from the very beginning VFIO worked
better than legacy KVM, and there were no "missing" features that would
require someone to use legacy KVM assignment.
I recall removing at least part of the supporting code for legacy KVM
assignment several years ago (pretty sure someone else removed the final
vestiges) and at the time thinking to myself "all this work that made
the code and the configuration more complicated, and made maintenance
more complicated and time consuming, only to *never* use it, and then
finally remove it 10 years later. How depressing :-/".
Also, referring to it as "legacy" is both premature (since iommufd
does not
have the feature parity yet) and confusing in the passage of time.
I think it would be better to leave it as-is for now, since there are
variant VFIO drivers besides vfio-pci that could be assigned to the
driver name attribute in tandem with enabling iommufd.
Actually a vfio variant driver (other than the variant driver that is
automatically discovered as "most appropriate" for the device) is
configured with <driver model='blah'/>, not name='blah'.
(Note: :since:`Since 1.0.5`, the ``name`` attribute has been
described to be used to select the type of PCI device assignment
("vfio", "kvm", or "xen"), but those values have been mostly
diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c
index c278b81652..88979ecc39 100644
--- a/src/conf/device_conf.c
+++ b/src/conf/device_conf.c
@@ -60,6 +60,8 @@ int
virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr node,
virDeviceHostdevPCIDriverInfo
*driver)
{
+ virTristateBool iommufd;
+ driver->iommufd = false;
if (virXMLPropEnum(node, "name",
virDeviceHostdevPCIDriverNameTypeFromString,
VIR_XML_PROP_NONZERO,
@@ -67,6 +69,10 @@ virDeviceHostdevPCIDriverInfoParseXML(xmlNodePtr
node,
return -1;
}
+ if (virXMLPropTristateBool(node, "iommufd", VIR_XML_PROP_NONE,
&iommufd) < 0)
+ return -1;
+ virTristateBoolToBool(iommufd, &driver->iommufd);
Storing this as 'bool' is losing information. We need to be able to tell
whether iommufd was not used because the user did not specify it or
whether it was not used because the user explicitly said no for future
compatibility reasons.
+1
That makes sense, I will update it to use virTristateBool instead in the
next revision.
-Nathan