> From: Raj, Ashok <ashok....@intel.com> > Sent: Tuesday, November 10, 2020 10:13 PM > > Thomas, > > With all these interrupt message storms ;-), I'm missing how to move > towards > an end goal. > > On Tue, Nov 10, 2020 at 11:27:29AM +0100, Thomas Gleixner wrote: > > Ashok, > > > > On Mon, Nov 09 2020 at 21:14, Ashok Raj wrote: > > > On Mon, Nov 09, 2020 at 11:42:29PM +0100, Thomas Gleixner wrote: > > >> On Mon, Nov 09 2020 at 13:30, Jason Gunthorpe wrote: > > > Approach to IMS is more of a phased approach. > > > > > > #1 Allow physical device to scale beyond limits of PCIe MSIx > > > Follows current methodology for guest interrupt programming and > > > evolutionary changes rather than drastic. > > > > Trapping MSI[X] writes is there because it allows to hand a device to an > > unmodified guest OS and to handle the case where the MSI[X] entries > > storage cannot be mapped exclusively to the guest. > > > > But aside of this, it's not required if the storage can be mapped > > exclusively, the guest is hypervisor aware and can get a host composed > > message via a hypercall. That works for physical functions and SRIOV, > > but not for SIOV. > > It would greatly help if you can put down what you see is blocking > to move forward in the following areas. >
Agree. We really need some guidance on how to move forward. I think all people in this thread are aligned now that it's not Intel or IDXD specific thing, e.g. need architectural solution, enabling IMS on PF/VF is important, etc. But what we are not sure is whether we need complete all requirements in one batch, or could evolve step-by-step as long as the growing path is clearly defined. IMHO finding a way to disable IMS in guest is more important than supporting IMS on PF/VF, since the latter requires hypercall which is not always available in all scenarios. Even if Linux includes hypercall support for all existing archs and hypervisors, it could run as an unmodified guest on a new hypervisor before this hypervisor gets its enlightenments into the Linux. So it is more prominent to find a way to force using MSI/MSI-x inside guest, as it allows such PFs/VFs still functional though not benefiting all scalability merits of IMS. If such two-step plans can be agreed, then the next open is about how to disable IMS in guest. We need a sane solution when checking in the initial host-only-IMS support. There are several options discussed in this thread: 1. Industry standard (e.g. a vendor-agnostic ACPI flag) followed by all platforms, hypervisors and OSes. It will require collaboration beyond Linux community; 2. IOMMU-vendor specific standards (DMAR, IORT, etc.) to report whether IMS is allowed, implying that IMS is tied to the IOMMU. This tradeoff is acceptable since IMS alone cannot make SIOV working which relies on the IOMMU anyway. and this might be an easier path to move forward and even not require to wait for all vendors to extend their tables together. On physical platform the FW always reports IMS as 'allowed' and there is time to change it. On virtual platform the hypervisor can choose to hide IMS in three ways: a) do not expose IOMMU b) expose IOMMU, but using the old format c) expose IOMMU, using the new format with IMS reported 'disallowed' a/b can well support legacy software stack. However, there is one potential issue with option 1/2. The construction of the virtual ACPI table is at VM creation time, likely based on whether a PV interrupt controller is exposed to this guest. However, in most cases the hypervisor doesn't know which guest OS is running and whether it will use the PV controller when the VM is being created. If IMS is marked as 'allowed' in the virtual DMAR table, an unmodified guest might just go to enable it as if it's on the native platform. Maybe what we really required is a flag to tell the guest that although IMS is available you cannot use it with traditional interrupt controllers? 3. Use IOMMU 'caching mode' as the hint of running as guest and disable IMS by default as long as 'caching mode' is detected. iirc all IOMMU vendors provide such capability for constructing shadow IOMMU page table. Later when hypercall support is detected for a specific hypervisor/arch, that path can override the IOMMU hint to enable IMS. Unlike the first two options, this will be a Linux-specific policy but self contained. Other guest OSes may not follow this way though. 4. Using CPUID to detect running as guest. But as Thomas pointed out, this approach is less reliable as not all hypervisors do this way. Thoughts? Thanks Kevin