On 5/7/25 4:50 AM, Shameerali Kolothum Thodi wrote:
-----Original Message-----
From: Markus Armbruster <arm...@redhat.com>
Sent: Wednesday, May 7, 2025 8:17 AM
To: Donald Dutile <ddut...@redhat.com>
Cc: Shameer Kolothum via <qemu-devel@nongnu.org>; qemu-
a...@nongnu.org; Shameerali Kolothum Thodi
<shameerali.kolothum.th...@huawei.com>; eric.au...@redhat.com;
peter.mayd...@linaro.org; j...@nvidia.com; nicol...@nvidia.com;
berra...@redhat.com; nath...@nvidia.com; mo...@nvidia.com;
smost...@google.com; Linuxarm <linux...@huawei.com>; Wangzhou (B)
<wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>;
Jonathan Cameron <jonathan.came...@huawei.com>;
zhangfei....@linaro.org
Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a
PCIe RC
Donald Dutile <ddut...@redhat.com> writes:
[...]
In this series, an iommu/smmu needs to be placed -BETWEEN- a sysbus
and a PCIe-tree,
or step-wise, plug an smmuv3 into a sysbus, and a pcie tree/domain/RC
into an SMMUv3.
RC = root complex?
Yes.
+1.
So, an smmu needs to be associated with a bus (tree), i.e., pcie.0, pcie.1...
One could model it as a PCIe device, attached at the pcie-RC ... but that's
not how it's modelled in ARM hw.
Physical ARM hardware?
yes, physical hw.
Assuming the virtual devices and buses we're discussing model physical
devices and buses:
* What are the physical devices of interest?
* How are they wired together? Which of the wires are buses, in
particular PCI buses?
SMMUv3 is a platform device and its placement in a system is typically as below
for PCI devices,
+------------------+
| PCIe Devices |
+------------------+
|
v
+-------------+ +---------------+
| PCIe RC A |<---->| Interconnect |
+-------------+ +---------------+
|
|
+------v---+
| SMMUv3.A |
| (IOMMU) |
+----+-----+
|
v
+-------+--------+
| System RAM |
+----------------+
This patch is attempting to establish that association between the PCIe RC and
the SMMUv3 device so that Qemu can build the ACPI tables/DT iommu mappings
for the SMMUv3 device.
I would refer to the ARM SMMU spec, Figure 2.3 in the G.a version, where
it's slightly different; more like:
+------------------+
| PCIe Devices | (one device, unless a PCIe switch is btwn the RC &
'Devices';
+------------------+ or, see more typical expansion below)
|
+-------------+
| PCIe RC A |
+-------------+
|
+------v---+ +-----------------------------------+
| SMMUv3.A | | Wide assortment of other platform |
| (IOMMU) | | devices not using SMMU |
+----+-----+ +-----------------------------------+
| | | |
+------+----------------------+---+---+-+
| System Interconnect |
+---------------------------------------+
|
+-------+--------+ +-----+-------------+
| System RAM |<--->| CPU (NUMA socket) |
+----------------+ +-------------------+
In fact, the PCIe can be quite complex with PCIe bridges, and multiple Root
Ports (RP's),
and multiple SMMU's:
+--------------+ +--------------+ +--------------+
| PCIe Device | | PCIe Device | | PCIe Device |
+--------------+ +--------------+ +--------------+
| | | <--- PCIe bus
+----------+ +----------+ +----------+
| PCIe RP | | PCIe RP | | PCIe RP | <- may be PCI Bridge,
may not
+----------+ +----------+ +----------+
| | |
+----------+ +----------+ +----------+
| SMMU | | SMMU | | SMMU |
+----------+ +----------+ +----------+
| | | <- may be a bus, may
not(hidden from OS)
+------------------+------------------+
|
+--------------------------+
| PCI RC A |
+--------------------------+
where PCIe RP's could be represented (even virtually) in -hw-
as a PCIe bridge, each downstream being a different PCIe bus under
a single PCIe RC (A, in above pic) -domain-.
... or the RPs don't have to have a PCIe bridge, and look like
'just an RP' that provides a PCIe (pt-to-pt, serial) bus, provided
by a PCIe RC. ... the PCIe architecture allows both, and I've seen
both implementations in hw (at least from an lspci perspective).
You can see the above hw implementation by doing an lspci on most
PCIe systems (definitely common on x86), where the RP's are represented
by 'PCIe bridge' elements -- and lots of them.
In real hw, these RP's effectively become (multiple) up & downstream
transaction queues
(which implement PCI ordering, and deadlock avoidance).
SMMUs are effectively 'inserted' in the (upstream) queue path(s).
The important take away above: the SMMU can be RP &/or device-specific -- they
do not have to be bound to an entire PCIe domain ... the *fourth* part of
an lspci output for a PCIe device: Domain:Bus:Device.Function.
This is the case for INTEL & AMD IOMMUs ... and why the ACPI tables have
to describe which devices (busses often) are associated with which SMMU(in IORT)
or IOMMU(DMAR/IVRS's for INTEL/AMD IOMMU).
The final take away: the (QEMU) SMMU/IOMMU must be associated with a PCIe bus
OR, the format has to be something like:
-device smmuv3, id=smmuv3.1
-device <blah>, smmu=smmuv3.1
where the device <-> SMMU (or if extended to x86, iommu) associativity is set
w/o bus associativity.
It'd be far easier to tag an entire bus with an SMMU/IOMMU, than a per-device
format, esp. if
one has lots of PCIe devices in their model; actually, even if they only have
one bus and 8 devices
(common), it'd be nice if a single iommu/smmu<->bus-num associativity could be
set.
Oh, one final note: it is possible, although I haven't seen it done this way
yet,
that an SMMU could be -in- a PCIe switch (further distributing SMMU
functionality
across a large PCIe subsystem) and it -could- be a PCIe device in the switch,
btwn the upstream and downstream bridges -- actually doing the SMMU xlations
at that layer..... for QEMU & IORT, it's associated with a PCIe bus.
But, if done correctly, that shouldn't matter -- in the example you gave wrt
serial,
it would be a new device, using common smmu core: smmuv3-pcie.
[Note: AMD actually identifies it's IOMMU as a PCIe device in an RC ... but
still uses
the ACPI tables to configure it to the OS.. so the PCIe-device is
basically a
device w/o a PCIe driver. AMD just went through hoops dealing with MS
and AMD-IOMMU-identification via PCIe.]
So, stepping back, and looking at a broad(er) SMMU -or- IOMMU QEMU perspective,
I would think this type of format would be best:
- bus pcie, id=pcie.<num>
- device iommu=[intel_iommu|smmuv3|amd_iommu], bus=[sysbus | pcie.<num>],
id=iommu.<num>
[Yes, I'm sticking with 'iommu' as the generic naming... everyone thinks of
device SMMUs as IOMMUs,
and QEMU should have a more arch-agnostic naming of these system functions. ]
and then the bus that devices are attached to in the system will define the
IOMMU/SMMU
devices that they manage/translate (for simpler IORT/DMAR/IVRS generation.]
Options for iommu=none could be applied to any device on any bus (pcie or
sysbus) to
logically exclude them from an IOMMU (effectively creating a virtual RP not
managed by
an IOMMU; and simple IORT/DMAR/IVRS exclusion).
If/when intel_iommu (& eventual amd_iommu) get muti-instance support, the above
formatting would work for them.
... and I would expect someone from libvirt-land to chime in on even a better
format that makes it more common/generic, but allows for more robust, per-arch
or
per-IOMMU/SMMU-arch variants/parametization.
If any of the above seems mirky, please ask for clarification(s).
Hopefully I haven't mis-typed any of the above, causing conflict or confusion,
as the concepts above are shared to show the array of hw architectures,
yet, try to dissolve them into common IOMMU config formats for QEMU
(for multi-instance-iommu and multi-bus).
- Don
SMMU's are discovered via ACPI tables.
That leaves us back to the 'how to associate an SMMUv3 to a PCIe
tree(RC)',
and that leads me to the other discussion & format I saw btwn Eric &
Shameer:
-device arm-smmv3,id=smmuv3.3
-device xxxx,smmuv3= smmuv3.3
where one tags a (PCIe) device to an smmuv3(id), which is needed to build
the (proper) IORT for (pcie-)device<->SMMUv3 associativity in a multi-
SMMUv3 configuration.
We could keep the bus=pcie.X option for the -device arm-smmuv3 to
indicate that all PCIe devices connected to the pcie.0 tree go through that
smmuv3;
qdev would model/config as the smmuv3 is 'attached to pcie.0'... which it
sorta is... and I think the IORT build could associate all devices on pcie.0 to
be associated
with the proper smmuv3.
Device property "bus" is strictly for specifying into which the bus the
device is to be plugged. The device's type must match the bus: only a
PCI device can plug into a PCI bus, and so forth.
The whole idea of reusing the "bus" property for SMMUv3 device was to make
it easier for libvirt. As I mentioned earlier we could go back and use a
different
property name like "primary-bus" or "pci-bus" for SMMUv3 dev here.
Thanks,
Shameer