On Tue, Mar 11, 2025 at 08:35:35AM +0100, Jan Beulich wrote: > On 10.03.2025 16:42, Roger Pau Monné wrote: > > On Mon, Mar 10, 2025 at 11:51:09AM +0100, Jan Beulich wrote: > >> On 10.03.2025 10:55, Roger Pau Monne wrote: > >>> Attempt to reduce the MSI entry writes, and the associated checking > >>> whether > >>> memory decoding and MSI-X is enabled for the PCI device, when the MSI data > >>> hasn't changed. > >>> > >>> When using Interrupt Remapping the MSI entry will contain an index into > >>> the remapping table, and it's in such remapping table where the MSI vector > >>> and destination CPU is stored. As such, when using interrupt remapping, > >>> changes to the interrupt affinity shouldn't result in changes to the MSI > >>> entry, and the MSI entry update can be avoided. > >>> > >>> Signal from the IOMMU update_ire_from_msi hook whether the MSI data or > >>> address fields have changed, and thus need writing to the device > >>> registers. > >>> Such signaling is done by returning 1 from the function. Otherwise > >>> returning 0 means no update of the MSI fields, and thus no write > >>> required. > >>> > >>> Signed-off-by: Roger Pau Monné <roger....@citrix.com> > >> > >> Reviewed-by: Jan Beulich <jbeul...@suse.com> > >> with two purely cosmetic suggestions and an only loosely related question > >> below. > >> > >>> --- a/xen/arch/x86/hvm/vmx/vmx.c > >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c > >>> @@ -415,7 +415,9 @@ static int cf_check vmx_pi_update_irte(const struct > >>> vcpu *v, > >>> > >>> ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain); > >>> > >>> - return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg); > >>> + rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg); > >>> + > >>> + return rc < 0 ? rc : 0; > >> > >> Only tangential here, but: Why does this function have a return type of > >> non-void, when neither caller cares? > > > > I'm afraid there's more wrong in vmx_pi_update_irte() that I've just > > spotted afterwards. > > > > vmx_pi_update_irte() passes to iommu_update_ire_from_msi() the > > msi_desc->msg field, but that field is supposed to always contain the > > non-translated MSI data, as you correctly pointed out in v1 it's > > consumed by dump_msi(). vmx_pi_update_irte() using msi_desc->msg to > > store the translated MSI effectively breaks dump_msi(). > > Oh, indeed - it violates what write_msi_msg() specifically checks by > an assertion.
Indeed. I have a pre-patch to fix this. > > Also vmx_pi_update_irte() relies on the IRT index never changing, as > > otherwise it's missing any logic to update the MSI registers. > > Isn't this a valid assumption here? msi_msg_to_remap_entry() will only > ever allocate a new index if none was previously allocated. Yes, but I think it needs to be accounted for, I've now switched this to: rc = iommu_update_ire_from_msi(msi_desc, &msg); if ( rc > 0 ) { /* * Callers of vmx_pi_update_irte() won't propagate the updated MSI * fields to the hardware, must assert there are no changes. */ ASSERT_UNREACHABLE(); rc = -EILSEQ; } return rc; Which I think better reflects the expectations of the function. > >>> --- a/xen/drivers/passthrough/amd/iommu_intr.c > >>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c > >>> @@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg( > >>> get_ivrs_mappings(iommu->seg)[alias_id].intremap_table); > >>> } > >>> > >>> - return 0; > >>> + return !fresh ? 0 : 1; > >>> } > >> > >> Simply > >> > >> return fresh; > >> > >> ? > >> > >>> @@ -546,7 +546,7 @@ int cf_check amd_iommu_msi_msg_update_ire( > >>> rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr, > >>> &msi_desc->remap_index, > >>> msg, &data); > >>> - if ( !rc ) > >>> + if ( rc > 0 ) > >>> { > >>> for ( i = 1; i < nr; ++i ) > >>> msi_desc[i].remap_index = msi_desc->remap_index + i; > >>> --- a/xen/drivers/passthrough/vtd/intremap.c > >>> +++ b/xen/drivers/passthrough/vtd/intremap.c > >>> @@ -506,6 +506,7 @@ static int msi_msg_to_remap_entry( > >>> unsigned int index, i, nr = 1; > >>> unsigned long flags; > >>> const struct pi_desc *pi_desc = msi_desc->pi_desc; > >>> + bool alloc = false; > >>> > >>> if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI ) > >>> nr = msi_desc->msi.nvec; > >>> @@ -529,6 +530,7 @@ static int msi_msg_to_remap_entry( > >>> index = alloc_remap_entry(iommu, nr); > >>> for ( i = 0; i < nr; ++i ) > >>> msi_desc[i].remap_index = index + i; > >>> + alloc = true; > >>> } > >>> else > >>> index = msi_desc->remap_index; > >>> @@ -601,7 +603,7 @@ static int msi_msg_to_remap_entry( > >>> unmap_vtd_domain_page(iremap_entries); > >>> spin_unlock_irqrestore(&iommu->intremap.lock, flags); > >>> > >>> - return 0; > >>> + return alloc ? 1 : 0; > >>> } > >> > >> Like above, simply > >> > >> return alloc; > >> > >> ? > > > > I wasn't sure whether this was overloading the boolean type and > > possibly breaking some MISRA rule. I can adjust. > > What we are to do with Misra's essential type system is entirely unclear at > this point, aiui. Thanks, given the findings above I will post v4 with the extra adjustments. Roger.