On Wed, Apr 27, 2022 at 10:05:54AM +0000, Andrew Cooper wrote:
> On 27/04/2022 07:59, Jan Beulich wrote:
> > On 26.04.2022 19:51, Andrew Cooper wrote:
> >> Hello,
> >>
> >> Edvin has found a machine with some very weird properties.  It is an HP
> >> ProLiant BL460c Gen8 with:
> >>
> >>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
> >>              +-01.0-[11]--
> >>              +-01.1-[02]--
> >>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>
> >> yet all 4 other functions on the device periodically hit IOMMU faults
> >> (~once every 5 mins, so definitely stats).
> >>
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> >> bdf80000
> >>
> >> There are several RMRRs covering the these devices, with:
> >>
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:03:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.0
> >> (XEN) [VT-D] endpoint: 0000:04:00.1
> >> (XEN) [VT-D] endpoint: 0000:04:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.3
> >> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
> >>
> >> being the one relevant to these faults.  I've not manually decoded the
> >> DMAR table because device paths are horrible to follow but there are at
> >> least the correct number of endpoints.  The functions all have SR-IOV
> >> (disabled) and ARI (enabled).  None have any Phantom functions described.
> >>
> >> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> >> but it's not right, because functions 1 thru 3 aren't actually phantom.
> > Indeed, and I think you really mean "pci-phantom=04:00,4".
> 
> As a quick tangent, the cmdline docs for pci-phantom= are in desperate
> need of an example and a description of how stride works.  I've got some
> ideas and notes jotted down.
> 
> Do we really mean ,4 here?  What happens for function 1?
> 
> > I guess we
> > should actually refuse "pci-phantom=04:00,1" in a case like this one.
> > The problem is that at the point we set pdev->phantom_stride we may
> > not know of the other devices, yet. But I guess we could attempt a
> > config space read of the supposed phantom function's device/vendor
> > and do <whatever> if these aren't both 0xffff.
> 
> At a minimum, we ought to warn when it looks like something is wonky,
> but I wouldn't go as far as rejecting.
> 
> All of these options to work around firmware/system screwups are applied
> to an already-non-working system, and there is absolutely no guarantee
> that necessary fixes make any kind of logical sense.

AFAICT with stride = 1 Xen will treat functions 1-7 as phantom
functions depending from function 0, which means the pdev struct won't
get updated when those phantom functions are assigned to a domain as
part of assigning function 0.  That would imply that functions 1 to 3
will be considered phantom but would also have a matching pdev that
allows them to be independently assigned to a domain, nothing good
will came out of it.

I agree with Jan that we need to explicitly reject strides that cover
functions that would otherwise be considered devices (ie: have valid
config space entries).  Or alternatively we need to remove the pdevs
for those functions now considered phantom.

Thanks, Roger.

Reply via email to