On 13/03/2025 6:57 pm, Andrii Sultanov wrote:
> Following on from 250d87dc, struct amd_iommu has its seg and bdf fields
> backwards with relation to pci_sbdf_t. Swap them around, and simplify the
> expressions regenerating an sbdf_t from seg+bdf.
>
> Simplify ioapic_sbdf and bpet_sbdf along the way. Adjust functions
> taking seg and bdf fields of these structs to take pci_sbdf_t instead.
> Simplify comparisons similarly.
>
> Bloat-o-meter reports:
>
> ```
> add/remove: 0/0 grow/shrink: 13/21 up/down: 352/-576 (-224)
> Function                                     old     new   delta
> _einittext                                 22028   22220    +192
> amd_iommu_prepare                            853     897     +44
> _invalidate_all_devices                      133     154     +21
> amd_iommu_domain_destroy                      46      63     +17
> disable_fmt                                12336   12352     +16
> _acpi_module_name                            416     432     +16
> amd_iommu_get_reserved_device_memory         521     536     +15
> parse_ivrs_table                            3955    3966     +11
> amd_iommu_assign_device                      271     282     +11
> find_iommu_for_device                        242     246      +4
> get_intremap_requestor_id                     49      52      +3
> amd_iommu_free_intremap_table                360     361      +1
> allocate_domain_resources                     82      83      +1
> reassign_device                              843     838      -5
> amd_iommu_remove_device                      352     347      -5
> amd_iommu_flush_iotlb                        359     354      -5
> iov_supports_xt                              270     264      -6
> amd_iommu_setup_domain_device               1478    1472      -6
> amd_setup_hpet_msi                           232     224      -8
> amd_iommu_ioapic_update_ire                  572     564      -8
> _hvm_dpci_msi_eoi                            157     149      -8
> amd_iommu_msi_enable                          33      20     -13
> register_range_for_device                    297     281     -16
> amd_iommu_add_device                         856     839     -17
> update_intremap_entry_from_msi_msg           879     861     -18
> amd_iommu_read_ioapic_from_ire               347     323     -24
> amd_iommu_msi_msg_update_ire                 472     431     -41
> flush_command_buffer                         460     417     -43
> set_iommu_interrupt_handler                  421     377     -44
> amd_iommu_detect_one_acpi                    918     868     -50
> amd_iommu_get_supported_ivhd_type             86      31     -55
> iterate_ivrs_mappings                        169     113     -56
> parse_ivmd_block                            1339    1271     -68
> enable_iommu                                1745    1665     -80
> ```
>
> Resolves: https://gitlab.com/xen-project/xen/-/issues/198
>
> Reported-by: Andrew Cooper <andrew.coop...@citrix.com>
> Signed-off-by: Andrii Sultanov <sultanovand...@gmail.com>

Well, this is awkward.  This is the task I'd put together for Cody to
try.  I guess I have to find another one.

In commit messages, we always want the subject alongside a hash.  I have
this local alias to help:

> xen.git/xen$ git commit-str 250d87dc
> commit 250d87dc3ff9 ("x86/msi: Change __msi_set_enable() to take
> pci_sbdf_t")
> xen.git/xen$ git help commit-str
> 'commit-str' is aliased to 'log -1 --abbrev=12 --pretty=format:'commit
> %h ("%s")''

(The name is not imaginative, and could probably be better.)

> @@ -239,17 +239,17 @@ static int __init register_range_for_device(
>      unsigned int bdf, paddr_t base, paddr_t limit,
>      bool iw, bool ir, bool exclusion)
>  {
> -    int seg = 0; /* XXX */
> -    struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
> +    pci_sbdf_t sbdf = { .seg = 0, .bdf = bdf };

The /* XXX */ wants to stay.  It's highlighting that this code isn't
muti-segment aware (yet).

> +    struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(sbdf.seg);
>      struct amd_iommu *iommu;
>      u16 req;
>      int rc = 0;
>  
> -    iommu = find_iommu_for_device(seg, bdf);
> +    iommu = find_iommu_for_device(sbdf);
>      if ( !iommu )
>      {
>          AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring 
> constrain\n",
> -                       &PCI_SBDF(seg, bdf));
> +                       &(sbdf));

The brackets should be dropped now.  This should be just &sbdf.

> @@ -1543,14 +1540,14 @@ static void invalidate_all_domain_pages(void)
>  static int cf_check _invalidate_all_devices(
>      u16 seg, struct ivrs_mappings *ivrs_mappings)
>  {
> -    unsigned int bdf; 
> +    pci_sbdf_t sbdf = { .seg = seg, .bdf = 0 };
>      u16 req_id;
>      struct amd_iommu *iommu;
>  
> -    for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
> +    for ( /* sbdf.bdf = 0 */ ; sbdf.bdf < ivrs_bdf_entries; sbdf.bdf++ )
>      {
> -        iommu = find_iommu_for_device(seg, bdf);
> -        req_id = ivrs_mappings[bdf].dte_requestor_id;
> +        iommu = find_iommu_for_device(sbdf);
> +        req_id = ivrs_mappings[sbdf.bdf].dte_requestor_id;

See how bloat-o-meter reports this as the 3rd most increased function. 
This is an example where merging to a pci_sbdf_t local variable is
making things worse.

Keep the bdf local variable, and use PCI_SBDF() for the call to
find_iommu_for_device().

The reason is that you're now modifying the low uint16_t half of a
uint32_t.  This requires emitting 16-bit logic (requires the Operand
Size Override prefix, contributing to your code size), it also suffers
register merge penalty


> diff --git a/xen/drivers/passthrough/amd/iommu_intr.c 
> b/xen/drivers/passthrough/amd/iommu_intr.c
> index 9abdc38053..0c91125ec0 100644
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -123,10 +123,10 @@ static spinlock_t* get_intremap_lock(int seg, int 
> req_id)
>             &shared_intremap_lock);
>  }
>  
> -static int get_intremap_requestor_id(int seg, int bdf)
> +static int get_intremap_requestor_id(pci_sbdf_t sbdf)
>  {
> -    ASSERT( bdf < ivrs_bdf_entries );
> -    return get_ivrs_mappings(seg)[bdf].dte_requestor_id;
> +    ASSERT( sbdf.bdf < ivrs_bdf_entries );
> +    return get_ivrs_mappings(sbdf.seg)[sbdf.bdf].dte_requestor_id;

This is also an example where merging the parameter is not necessarily
wise.  The segment and bdf parts are used differently, so this function
now has to split the one parameter in two, and shift segment by 16 just
to use it.

> @@ -335,20 +336,19 @@ void cf_check amd_iommu_ioapic_update_ire(
>      new_rte.raw = rte;
>  
>      /* get device id of ioapic devices */
> -    bdf = ioapic_sbdf[idx].bdf;
> -    seg = ioapic_sbdf[idx].seg;
> -    iommu = find_iommu_for_device(seg, bdf);
> +    sbdf.sbdf = ioapic_sbdf[idx].sbdf.sbdf;

sbdf = ioapic_sbdf[idx].sbdf;

> +    iommu = find_iommu_for_device(sbdf);
>      if ( !iommu )
>      {
>          AMD_IOMMU_WARN("failed to find IOMMU for IO-APIC @ %04x:%04x\n",
> -                       seg, bdf);
> +                       sbdf.seg, sbdf.bdf);

This should be converted to %pp, which has a side effect of correcting
the rendering of bdf.

> @@ -515,15 +515,15 @@ int cf_check amd_iommu_msi_msg_update_ire(
>      struct msi_desc *msi_desc, struct msi_msg *msg)
>  {
>      struct pci_dev *pdev = msi_desc->dev;
> -    int bdf, seg, rc;
> +    pci_sbdf_t sbdf;
> +    int rc;
>      struct amd_iommu *iommu;
>      unsigned int i, nr = 1;
>      u32 data;
>  
> -    bdf = pdev ? pdev->sbdf.bdf : hpet_sbdf.bdf;
> -    seg = pdev ? pdev->seg : hpet_sbdf.seg;
> +    sbdf.sbdf = pdev ? pdev->sbdf.sbdf : hpet_sbdf.sbdf.sbdf;

This is a better example where

sbdf = pdev ? pdev->sbdf : hpet_sbdf.sbdf;

is equivalent.

> diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> b/xen/drivers/passthrough/amd/iommu_map.c
> index dde393645a..17070904fa 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -694,17 +694,16 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain 
> *d,
>  int cf_check amd_iommu_get_reserved_device_memory(
>      iommu_grdm_t *func, void *ctxt)
>  {
> -    unsigned int seg = 0 /* XXX */, bdf;
> -    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
> +    pci_sbdf_t sbdf = {0};

"= {}" please.

GCC has just introduced a nasty bug (they claim its a feature) where {0}
on unions now zeros less than it used to do.  pci_sbdf_t doesn't tickle
this corner case, but we need to be proactive about removing examples of
"= {0}".

> +    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(sbdf.seg);
>      /* At least for global entries, avoid reporting them multiple times. */
>      enum { pending, processing, done } global = pending;
>  
> -    for ( bdf = 0; bdf < ivrs_bdf_entries; ++bdf )
> +    for ( /* sbdf.bdf = 0 */ ; sbdf.bdf < ivrs_bdf_entries; ++sbdf.bdf )
>      {
> -        pci_sbdf_t sbdf = PCI_SBDF(seg, bdf);
> -        const struct ivrs_unity_map *um = ivrs_mappings[bdf].unity_map;
> -        unsigned int req = ivrs_mappings[bdf].dte_requestor_id;
> -        const struct amd_iommu *iommu = ivrs_mappings[bdf].iommu;
> +        const struct ivrs_unity_map *um = ivrs_mappings[sbdf.bdf].unity_map;
> +        unsigned int req = ivrs_mappings[sbdf.bdf].dte_requestor_id;
> +        const struct amd_iommu *iommu = ivrs_mappings[sbdf.bdf].iommu;

Again, this will be better staying as two split variables.

~Andrew

Reply via email to