On 1/1/26 17:56, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I've got yet another report[1] of device failing because (I assume) the
> drivers reads MSI/MSI-X values (thinking it sees values actually set in
> the HW) and then pass them to the device via some alternative means.
> IIUC this is what IMS does.
> 
> I'm interested in two things:
> 1. Some plan for a long term solution - it was briefly discussed on
> XenDevel matrix room in September, Roger said:
> 
>> urg, that's the spec that also defines IMS IIRC?  I think the only way
>> to support anything like that is using vfio/mdev and re-using the
>> drivers from Linux.  There's too much device-specific magic to
>> implement any of this in Xen, or do our own Xen-specific drivers.
> 
> 2. A short term workaround for few specific devices. If you look at the
> linked threads, users resort to patching the domU driver and then
> copying MSI values from dom0's lspci output manually... I think we can
> do better than this short term, via some quirks in QEMU. Either let the
> domU see the real HW values, or translate IMS writes at QEMU level
> (assuming they can be identified). Disclaimer - I haven't looked yet at
> this specific driver, nor the SIOV/IMS spec, so I'm not sure if that's
> viable approach...
> 
> [1] 
> https://forum.qubes-os.org/t/solved-qualcomm-qcnfa765-ath11k-wcn6855-wifi-working-on-thinkpad-p14s-gen4-amd/38192

Disclaimer: I have not read the Intel or AMD IOMMU specs, do not have
access to the PCI spec, and know very little about ath11k.  All of
this is based on various mailing list threads and Matrix messages.
It might be wrong.  Please correct me if it is.

First, a background on IMS.  All of this comes from [2] and its thread.

IMS is a result of wanting to store interrupts in host memory.  This
avoids needing to have them in expensive on-die SRAM or including DRAM
in the card.  However, on-die SRAM is used to cached the interrupts.
This means that interrupts must be managed via command queues.

This causes problems for Linux and for any other OS that expects
to be able to change IRQs without a command/response operation.
The only workarounds I saw in that thread are:

1. Redesign the OS so it never needs to change interrupts from a
   context where device commands are impossible.
2. Modify the command queue code so it can run in interrupt context.
3. Rely on the IOMMU to remap interrupts.

ath10k and friends use an even worse hack, which is to pin everything
to a fixed CPU so that the problems mentioned above (which relate to
moving interrupts between CPUs) don't arise.

Now, the part that is relevant to Xen:

IMS *also* causes problems for hypervisors.  Hypervisors present guests
with a virtualized MSI range rather than exposing the actual one.
My understanding is that virtualization serves two purposes:

4. It turns non-remappable MSIs into remappable ones, so that they
   are translated by the IOMMU instead of being rejected.
5. It fixes some information in the MSI (target CPU?) so that the
   interrupt correctly reaches the guest.

With IMS, virtualizing the guest's interrupts is no longer possible.
That would require virtualizing the command queues, which are
device-specific and in any case (probably) too complex for the
hypervsior to handle.

The only way I know of to make IMS work under Xen is option (3) above:
give the guest access to the real MSI configuration space, and rely on
the IOMMU to translate the guest's interrupts to whatever it needs.
This is possible on AMD but not on Intel.  See [4] and the related
Matrix messages.

For Intel, the only solution I know of is to patch ath11k and friends
to get the real interrupt from Xen and/or QEMU so they can program the
hardware accordingly.  This will require a driver patch.  I *think*
ath11k and friends are the only IMS devices consumers are likely
to run into.  I suspect the others are likely enterprise devices
with VFIO/MDEV support.  Supporting all devices could be done via a
paravirtualized interface, perhaps as part of paravirtualized IOMMU
support.  On Intel, the IOMMU must play a role in MSI assignment
anyway, so a PV IOMMU could coordinate with a hypervisor to avoid
this kind of problem.

A lot of this information is taken from the thread in [3].

[2]: https://lore.kernel.org/lkml/[email protected]/
[3]: 
https://lore.kernel.org/xen-devel/[email protected]/t/#m590f8a0de6fecde893345a6836828dc84eaccd5d
[4]: 
https://matrix.to/#/!XcEgmbCouiNWHlGdHk:matrix.org/$laXuwPmDLINXAYnwoDsVCUvByPS6-5IjB_1OCAl9zgQ
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to