On Fri, May 09, 2025 at 07:29:28AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Donald Dutile <ddut...@redhat.com>
> > Sent: Thursday, May 8, 2025 2:45 PM
> > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.th...@huawei.com>; Markus Armbruster
> > <arm...@redhat.com>
> > Cc: Shameer Kolothum via <qemu-devel@nongnu.org>; qemu-
> > a...@nongnu.org; eric.au...@redhat.com; peter.mayd...@linaro.org;
> > j...@nvidia.com; nicol...@nvidia.com; berra...@redhat.com;
> > nath...@nvidia.com; mo...@nvidia.com; smost...@google.com; Linuxarm
> > <linux...@huawei.com>; Wangzhou (B) <wangzh...@hisilicon.com>;
> > jiangkunkun <jiangkun...@huawei.com>; Jonathan Cameron
> > <jonathan.came...@huawei.com>; zhangfei....@linaro.org
> > Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a
> > PCIe RC
> > 
> > 
> > 
> > On 5/7/25 4:50 AM, Shameerali Kolothum Thodi wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Markus Armbruster <arm...@redhat.com>
> > >> Sent: Wednesday, May 7, 2025 8:17 AM
> > >> To: Donald Dutile <ddut...@redhat.com>
> > >> Cc: Shameer Kolothum via <qemu-devel@nongnu.org>; qemu-
> > >> a...@nongnu.org; Shameerali Kolothum Thodi
> > >> <shameerali.kolothum.th...@huawei.com>; eric.au...@redhat.com;
> > >> peter.mayd...@linaro.org; j...@nvidia.com; nicol...@nvidia.com;
> > >> berra...@redhat.com; nath...@nvidia.com; mo...@nvidia.com;
> > >> smost...@google.com; Linuxarm <linux...@huawei.com>; Wangzhou
> > (B)
> > >> <wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>;
> > >> Jonathan Cameron <jonathan.came...@huawei.com>;
> > >> zhangfei....@linaro.org
> > >> Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a
> > >> PCIe RC
> > >>
> > >> Donald Dutile <ddut...@redhat.com> writes:
> > >>
> > >> [...]
> > >>
> > >>> In this series, an iommu/smmu needs to be placed -BETWEEN- a sysbus
> > >> and a PCIe-tree,
> > >>> or step-wise, plug an smmuv3 into a sysbus, and a pcie tree/domain/RC
> > >> into an SMMUv3.
> > >>
> > >> RC = root complex?
> > >
> > > Yes.
> > >
> > +1.
> > 
> > >>
> > >>> So, an smmu needs to be associated with a bus (tree), i.e., pcie.0,
> > pcie.1...
> > >>> One could model it as a PCIe device, attached at the pcie-RC ... but
> > that's
> > >> not how it's modelled in ARM hw.
> > >>
> > >> Physical ARM hardware?
> > >>
> > yes, physical hw.
> > 
> > >> Assuming the virtual devices and buses we're discussing model physical
> > >> devices and buses:
> > >>
> > >> * What are the physical devices of interest?
> > >>
> > >> * How are they wired together?  Which of the wires are buses, in
> > >>    particular PCI buses?
> > >
> > > SMMUv3 is a platform device and its placement in a system is typically as
> > below
> > > for PCI devices,
> > >
> > > +------------------+
> > > |   PCIe Devices   |
> > > +------------------+
> > >           |
> > >           v
> > >    +-------------+      +---------------+
> > >    |  PCIe RC A  |<---->| Interconnect  |
> > >    +-------------+      +---------------+
> > >                                 |
> > >                                 |
> > >                          +------v---+
> > >                          | SMMUv3.A |
> > >                          | (IOMMU)  |
> > >                          +----+-----+
> > >                               |
> > >                               v
> > >                       +-------+--------+
> > >                       |   System RAM   |
> > >                       +----------------+
> > >
> > > This patch is attempting to establish that association between the PCIe
> > RC and
> > > the SMMUv3 device so that Qemu can build the ACPI tables/DT iommu
> > mappings
> > > for the SMMUv3 device.
> > >
> > I would refer to the ARM SMMU spec, Figure 2.3 in the G.a version, where
> > it's slightly different; more like:
> 
> That's right. The spec does indeed cover all possible scenarios, whereas my 
> earlier
> comments were focused more specifically on the common case of systems using
> SMMUv3 with PCIe devices.
> 
> Currently, QEMU doesn't support non-PCI devices with SMMUv3, neither the
> more complex distributed SMMU cases you have described below. And this series
> doesn't aim to add those supports either. If needed, we can treat those as a 
> separate
> efforts—similar to what was attempted in [1]. That said, agree that the design
> choices we make now should not hinder adding such support in the future.
> 
> And as far as I can see, nothing in this series would prevent that and if 
> anything,
> the new device type SMMUv3 model actually makes it easier to support those.
> 
> >   +------------------+
> >   |   PCIe Devices   | (one device, unless a PCIe switch is btwn the RC &
> > 'Devices';
> >   +------------------+   or, see more typical expansion below)
> >             |
> >      +-------------+
> >      |  PCIe RC A  |
> >      +-------------+
> >             |
> >      +------v---+    +-----------------------------------+
> >      | SMMUv3.A |    | Wide assortment of other platform |
> >      | (IOMMU)  |    |   devices not using SMMU          |
> >      +----+-----+    +-----------------------------------+
> >           |                      |   |   |
> >    +------+----------------------+---+---+-+
> >    |         System Interconnect           |
> >    +---------------------------------------+
> >                                 |
> >    +-------+--------+     +-----+-------------+
> >    |   System RAM   |<--->| CPU (NUMA socket) |
> >    +----------------+     +-------------------+
> > 
> > In fact, the PCIe can be quite complex with PCIe bridges, and multiple Root
> > Ports (RP's),
> > and multiple SMMU's:
> > 
> >      +--------------+   +--------------+   +--------------+
> >      | PCIe Device  |   | PCIe Device  |   | PCIe Device  |
> >      +--------------+   +--------------+   +--------------+
> >            |                  |                  |        <--- PCIe bus
> >       +----------+       +----------+      +----------+
> >       | PCIe RP  |       | PCIe RP  |      | PCIe RP  |  <- may be PCI 
> > Bridge, may
> > not
> >       +----------+       +----------+      +----------+
> >           |                  |                  |
> >       +----------+       +----------+       +----------+
> >       |  SMMU    |       |  SMMU    |       |  SMMU    |
> >       +----------+       +----------+       +----------+
> >           |                  |                  |   <- may be a bus, may 
> > not(hidden from OS)
> >           +------------------+------------------+
> >                              |
> >              +--------------------------+
> >              |          PCI RC A        |
> >              +--------------------------+
> > 
> > where PCIe RP's could be represented (even virtually) in -hw-
> > as a PCIe bridge, each downstream being a different PCIe bus under
> > a single PCIe RC (A, in above pic) -domain-.
> > ... or the RPs don't have to have a PCIe bridge, and look like
> > 'just an RP' that provides a PCIe (pt-to-pt, serial) bus, provided
> > by a PCIe RC. ... the PCIe architecture allows both, and I've seen
> > both implementations in hw (at least from an lspci perspective).
> > 
> > You can see the above hw implementation by doing an lspci on most
> > PCIe systems (definitely common on x86), where the RP's are represented
> > by 'PCIe bridge' elements -- and lots of them.
> > In real hw, these RP's effectively become (multiple) up & downstream
> > transaction queues
> > (which implement PCI ordering, and deadlock avoidance).
> > SMMUs are effectively 'inserted' in the (upstream) queue path(s).
> > 
> > The important take away above: the SMMU can be RP &/or device-specific -
> > - they
> > do not have to be bound to an entire PCIe domain ... the *fourth* part of
> > an lspci output for a PCIe device: Domain:Bus:Device.Function.
> > This is the case for INTEL & AMD IOMMUs ... and why the ACPI tables have
> > to describe which devices (busses often) are associated with which
> > SMMU(in IORT)
> > or IOMMU(DMAR/IVRS's for INTEL/AMD IOMMU).
> > 
> > The final take away: the (QEMU) SMMU/IOMMU must be associated with a
> > PCIe bus
> > OR, the format has to be something like:
> >    -device smmuv3, id=smmuv3.1
> >    -device <blah>, smmu=smmuv3.1
> 
> Agree. For PCie devices with SMMUv3 we need to associate it with a PCIe bus
> and for non-pci cases probably needs a per device association.
> 
> > where the device <-> SMMU (or if extended to x86, iommu) associativity is
> > set w/o bus associativity.
> > It'd be far easier to tag an entire bus with an SMMU/IOMMU, than a per-
> > device format, esp. if
> > one has lots of PCIe devices in their model; actually, even if they only 
> > have
> > one bus and 8 devices
> > (common), it'd be nice if a single iommu/smmu<->bus-num associativity
> > could be set.
> > 
> > Oh, one final note: it is possible, although I haven't seen it done this way
> > yet,
> > that an SMMU could be -in- a PCIe switch (further distributing SMMU
> > functionality
> > across a large PCIe subsystem) and it -could- be a PCIe device in the 
> > switch,
> > btwn the upstream and downstream bridges -- actually doing the SMMU
> > xlations
> > at that layer..... for QEMU & IORT, it's associated with a PCIe bus.
> > But, if done correctly, that shouldn't matter -- in the example you gave wrt
> > serial,
> > it would be a new device, using common smmu core: smmuv3-pcie.
> > [Note: AMD actually identifies it's IOMMU as a PCIe device in an RC ... but
> > still uses
> >         the ACPI tables to configure it to the OS.. so the PCIe-device is 
> > basically
> > a
> >         device w/o a PCIe driver. AMD just went through hoops dealing with
> > MS
> >         and AMD-IOMMU-identification via PCIe.]
> > 
> > So, stepping back, and looking at a broad(er) SMMU -or- IOMMU QEMU
> > perspective,
> > I would think this type of format would be best:
> > 
> > - bus pcie, id=pcie.<num>
> > - device iommu=[intel_iommu|smmuv3|amd_iommu], bus=[sysbus |
> > pcie.<num>], id=iommu.<num>
> > [Yes, I'm sticking with 'iommu' as the generic naming... everyone thinks of
> > device SMMUs as IOMMUs,
> >   and QEMU should have a more arch-agnostic naming of these system
> > functions. ]
> 
> Ok. But to circle back to what originally started this discussion—how 
> important
> is it to rely on the default "bus" in this case? As Markus pointed out, SMMUv3
> is a platform device on the sysbus, so its default bus type can’t point to 
> something
> like PCIe. QEMU doesn’t currently support that.
> 
> The main motivation for using the default "bus" so far has been to have better
> compatibility with libvirt. Would libvirt be flexible enough if we switched 
> to using
> something like a "primary-bus" property instead?

Sorry if my previous comments misled you, when I previously talked about
linking via a "bus" property I was not considering the fact that "bus"
is a special property inside QEMU. From a libvirt POV we don't care what
the property is call - it was just intended to be a general illustration
of cross-referencing the iommu with the PCI bus it needed to be associated
with.

> -device arm-smmuv3,primary-bus=pcie.0
> -device virtio-net-pci,bus=pcie.0
> -device pxb-pcie,id=pcie.1,bus_nr=2
> -device arm-smmuv3,primary-bus=pcie.1
> ...


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to