On Tue, Mar 11, 2025 at 08:35:35AM +0100, Jan Beulich wrote:
> On 10.03.2025 16:42, Roger Pau Monné wrote:
> > On Mon, Mar 10, 2025 at 11:51:09AM +0100, Jan Beulich wrote:
> >> On 10.03.2025 10:55, Roger Pau Monne wrote:
> >>> Attempt to reduce the MSI entry writes, and the associated checking 
> >>> whether
> >>> memory decoding and MSI-X is enabled for the PCI device, when the MSI data
> >>> hasn't changed.
> >>>
> >>> When using Interrupt Remapping the MSI entry will contain an index into
> >>> the remapping table, and it's in such remapping table where the MSI vector
> >>> and destination CPU is stored.  As such, when using interrupt remapping,
> >>> changes to the interrupt affinity shouldn't result in changes to the MSI
> >>> entry, and the MSI entry update can be avoided.
> >>>
> >>> Signal from the IOMMU update_ire_from_msi hook whether the MSI data or
> >>> address fields have changed, and thus need writing to the device 
> >>> registers.
> >>> Such signaling is done by returning 1 from the function.  Otherwise
> >>> returning 0 means no update of the MSI fields, and thus no write
> >>> required.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
> >>
> >> Reviewed-by: Jan Beulich <jbeul...@suse.com>
> >> with two purely cosmetic suggestions and an only loosely related question 
> >> below.
> >>
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -415,7 +415,9 @@ static int cf_check vmx_pi_update_irte(const struct 
> >>> vcpu *v,
> >>>  
> >>>      ASSERT_PDEV_LIST_IS_READ_LOCKED(msi_desc->dev->domain);
> >>>  
> >>> -    return iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> >>> +    rc = iommu_update_ire_from_msi(msi_desc, &msi_desc->msg);
> >>> +
> >>> +    return rc < 0 ? rc : 0;
> >>
> >> Only tangential here, but: Why does this function have a return type of
> >> non-void, when neither caller cares?
> > 
> > I'm afraid there's more wrong in vmx_pi_update_irte() that I've just
> > spotted afterwards.
> > 
> > vmx_pi_update_irte() passes to iommu_update_ire_from_msi() the
> > msi_desc->msg field, but that field is supposed to always contain the
> > non-translated MSI data, as you correctly pointed out in v1 it's
> > consumed by dump_msi().  vmx_pi_update_irte() using msi_desc->msg to
> > store the translated MSI effectively breaks dump_msi().
> 
> Oh, indeed - it violates what write_msi_msg() specifically checks by
> an assertion.

Indeed.  I have a pre-patch to fix this.

> > Also vmx_pi_update_irte() relies on the IRT index never changing, as
> > otherwise it's missing any logic to update the MSI registers.
> 
> Isn't this a valid assumption here? msi_msg_to_remap_entry() will only
> ever allocate a new index if none was previously allocated.

Yes, but I think it needs to be accounted for, I've now switched this
to:

    rc = iommu_update_ire_from_msi(msi_desc, &msg);
    if ( rc > 0 )
    {
        /*
         * Callers of vmx_pi_update_irte() won't propagate the updated MSI
         * fields to the hardware, must assert there are no changes.
         */
        ASSERT_UNREACHABLE();
        rc = -EILSEQ;
    }

    return rc;

Which I think better reflects the expectations of the function.

> >>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> >>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> >>> @@ -492,7 +492,7 @@ static int update_intremap_entry_from_msi_msg(
> >>>                 get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
> >>>      }
> >>>  
> >>> -    return 0;
> >>> +    return !fresh ? 0 : 1;
> >>>  }
> >>
> >> Simply
> >>
> >>     return fresh;
> >>
> >> ?
> >>
> >>> @@ -546,7 +546,7 @@ int cf_check amd_iommu_msi_msg_update_ire(
> >>>      rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> >>>                                              &msi_desc->remap_index,
> >>>                                              msg, &data);
> >>> -    if ( !rc )
> >>> +    if ( rc > 0 )
> >>>      {
> >>>          for ( i = 1; i < nr; ++i )
> >>>              msi_desc[i].remap_index = msi_desc->remap_index + i;
> >>> --- a/xen/drivers/passthrough/vtd/intremap.c
> >>> +++ b/xen/drivers/passthrough/vtd/intremap.c
> >>> @@ -506,6 +506,7 @@ static int msi_msg_to_remap_entry(
> >>>      unsigned int index, i, nr = 1;
> >>>      unsigned long flags;
> >>>      const struct pi_desc *pi_desc = msi_desc->pi_desc;
> >>> +    bool alloc = false;
> >>>  
> >>>      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
> >>>          nr = msi_desc->msi.nvec;
> >>> @@ -529,6 +530,7 @@ static int msi_msg_to_remap_entry(
> >>>          index = alloc_remap_entry(iommu, nr);
> >>>          for ( i = 0; i < nr; ++i )
> >>>              msi_desc[i].remap_index = index + i;
> >>> +        alloc = true;
> >>>      }
> >>>      else
> >>>          index = msi_desc->remap_index;
> >>> @@ -601,7 +603,7 @@ static int msi_msg_to_remap_entry(
> >>>      unmap_vtd_domain_page(iremap_entries);
> >>>      spin_unlock_irqrestore(&iommu->intremap.lock, flags);
> >>>  
> >>> -    return 0;
> >>> +    return alloc ? 1 : 0;
> >>>  }
> >>
> >> Like above, simply
> >>
> >>     return alloc;
> >>
> >> ?
> > 
> > I wasn't sure whether this was overloading the boolean type and
> > possibly breaking some MISRA rule.  I can adjust.
> 
> What we are to do with Misra's essential type system is entirely unclear at
> this point, aiui.

Thanks, given the findings above I will post v4 with the extra
adjustments.

Roger.

Reply via email to