Hi Daniel,

> -----Original Message-----
> From: Daniel P. Berrangé <berra...@redhat.com>
> Sent: Friday, January 31, 2025 9:42 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org;
> eric.au...@redhat.com; peter.mayd...@linaro.org; j...@nvidia.com;
> nicol...@nvidia.com; ddut...@redhat.com; Linuxarm
> <linux...@huawei.com>; Wangzhou (B) <wangzh...@hisilicon.com>;
> jiangkunkun <jiangkun...@huawei.com>; Jonathan Cameron
> <jonathan.came...@huawei.com>; zhangfei....@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > Each "arm-smmuv3-nested" instance, when the first device gets attached
> > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> kernel
> > SMMUv3 driver. This domain will have a pointer representing the physical
> > SMMUv3 that the device belongs. And any other device which belongs to
> > the same physical SMMUv3 can share this S2 domain.
> 
> Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> C and D, we could end up with A&C and B&D paired, or we could
> end up with A&D and B&C paired, depending on whether we plug
> the first VFIO device into guest SMMUv3  A or B.
> 
> This is bad.  Behaviour must not vary depending on the order
> in which we create devices.
> 
> An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> to be paired to a guest NUMA node. A guest NUMA node is liable
> to be paired to host NUMA node. The guest/host SMMU pairing
> must be chosen such that it makes conceptual sense wrt to the
> guest PXB NUMA to host NUMA pairing.
> 
> If the kernel picks guest<->host SMMU pairings on a first-device
> first-paired basis, this can end up with incorrect guest NUMA
> configurations.

Ok. I am trying to understand how this can happen as I assume the
Guest PXB numa node is picked up by whatever device we are
attaching to it and based on which numa_id that device belongs to
in physical host.

And the physical smmuv3 numa id will be the same to that of the
device numa_id  it is associated with. Isn't it?

For example I have a system here, that has 8 phys SMMUv3s and numa
assignments on this is something like below,

Phys SMMUv3.0 --> node 0
  \..dev1 --> node0
Phys SMMUv3.1 --> node 0
\..dev2 -->node0
Phys SMMUv3.2 --> node 0
Phys SMMUv3.3 --> node 0

Phys SMMUv3.4 --> node 1
Phys SMMUv3.5 --> node 1
\..dev5 --> node1
Phys SMMUv3.6 --> node 1
Phys SMMUv3.7 --> node 1


If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
 "arm-smmuv3-accel" instances as they belong to different phys SMMUv3s.

-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
-device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
-device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
-device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
-device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0

So I guess even if we don't specify the physical SMMUv3 association
explicitly, the kernel will check that based on the devices the Guest
SMMUv3 is attached to (and hence the Numa association), right?

In other words how an explicit association helps us here?

Or is it that the Guest PXB numa_id allocation is not always based
on device numa_id?

(May be I am missing something here. Sorry)

Thanks,
Shameer 















 
> The mgmt apps needs to be able to tell QEMU exactly which
> host SMMU to pair with each guest SMMU, and QEMU needs to
> then tell the kernel.
> 
> > And as I mentioned in cover letter, Qemu will report,
> >
> > "
> > Attempt to add the HNS VF to a different SMMUv3 will result in,
> >
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> vfio 0000:7d:02.2:
> >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2
> (38) to id=11: Invalid argument
> >
> > At present Qemu is not doing any extra validation other than the above
> > failure to make sure the user configuration is correct or not. The
> > assumption is libvirt will take care of this.
> > "
> > So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> That's good error checking, and required, but also insufficient
> as illustrated above IMHO.
> 
> > If a more explicit association is required, some help from kernel is
> required
> > to identify the physical SMMUv3 associated with the device.
> 
> Yep, I think SMMUv3 info for devices needs to be exposed to userspace,
> as well as a mechanism for QEMU to tell the kernel the SMMU mapping.
> 
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
> 

Reply via email to