On 2025-03-04 05:23, Roger Pau Monné wrote:
On Fri, Feb 28, 2025 at 03:25:52PM -0500, Jason Andryuk wrote:
On 2025-02-28 04:36, Roger Pau Monné wrote:
On Thu, Feb 27, 2025 at 01:28:11PM -0500, Jason Andryuk wrote:
On 2025-02-27 05:23, Roger Pau Monné wrote:
On Wed, Feb 26, 2025 at 04:11:25PM -0500, Jason Andryuk wrote:
To work around this, we can, for per-device IRTs, program the hardware
to use the guest data & associated IRTE.  The address doesn't matter
since the IRTE handles that, and the Xen address & vector can be used as
expected.

All this work on AMD because when interrupt remapping is enabled all
MSIs are handled by the remapping table, while on Intel there's still
a bit in the MSI address field to signal whether the MSI is using a
remapping entry, or is using the "compatibility" format (iow: no
remapping).

So, on Intel, if the guest hands the device the MSI address, it can decided
to bypass remapping?

Thanks for providing insight into the Intel inner workings.  That's why I am
asking.

Yes, sorry, I'm afraid I don't have any good solution for Intel, at
least not anything similar to what you propose to do on AMD-Vi.  I
guess we could take a partial solution for AMD-Vi only, but it's
sub-optimal from Xen perspective to have a piece of hardware working
fine on AMD and not on Intel.

I only need AMD to work ;)

But yeah, I thought I should make an effort to get both working.

Kind of tangential to this approach.  Do you know which register(s)
are used to store the non-architectural MSI address and data fields?

I'm wondering if it simply would be easier to introduce a quirk for
this device in vPCI (and possibly QEMU?) that intercepts writes to the
out of band MSI registers.  That should work for both Intel and AMD,
but would have the side effect that Xen would need to intercept
accesses to at least a full page, and possibly forward accesses to
adjacent registers.

From the QEMU part of the vfio hack:
* We therefore come up with a really crude quirk that looks for values
* written to the ATH11K_PCI_WINDOW (defined in Linux driver as starting
* at 0x80000 with an 18-bit mask, ie. 256k) that match the guest MSI
* address.  When found we replace the data with the host physical
* address and set a cookie to match the MSI data write, again replacing
* with the host value and clearing the cookie.

https://lore.kernel.org/ath11k/20240812170045.1584000-1-alex.william...@redhat.com/

This is inside BAR0, AIUI. I'm guessing, but I think the driver puts them into a command ring, so it's not a fixed location. The large area, and since we won't normally intercept BAR access, made me not want to pursue this.

e.g. Replace amd_iommu_perdev_intremap with something generic.

The ath11k device supports and tries to enable 32 MSIs.  Linux in PVH
dom0 and HVM domU fails enabling 32 and falls back to just 1, so that is
all that has been tested.

DYK why it fails to enable 32?

Not exactly - someone else had the card.  msi_capability_init() failed. If
it ends up in arch_setup_msi_irqs(), only 1 MSI is supported.  But precisely
where the mutiple nvecs was denied was not tracked down.

Does it also fail on native?  I'm mostly asking because it would be
good to get to the bottom of this, so that we don't come up with a
partial solution that will break if multi-msi is used later in Linux.

My understanding is native and PV dom0 work with 32, and it's Linux deciding
not to use multiple MSI.

It might be this:
static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
{
         int irq, pirq;
         struct msi_desc *msidesc;
         struct msi_msg msg;

         if (type == PCI_CAP_ID_MSI && nvec > 1)
                 return 1;

I'll have to look into this more.

That shouldn't apply to PVH because it never exposes
XENFEAT_hvm_pirqs, and I would expect xen_hvm_setup_msi_irqs() to not
get used (otherwise we have a bug somewhere).

Okay. Yeah, this doesn't seem to get called. I asked internally, and no one tracked down precisely why multi-msi is denied. I still need to get around to that.

Thanks,
Jason

Reply via email to