On Mon, Jan 19, 2026 at 03:46:55PM +0100, Jan Beulich wrote:
> Legacy PCI devices don't have any extended config space. Reading any part
> thereof may return all ones or other arbitrary data, e.g. in some cases
> base config space contents repeatedly.
> 
> Logic follows Linux 6.19-rc's pci_cfg_space_size(), albeit leveraging our
> determination of device type; in particular some comments are taken
> verbatim from there.
> 
> Signed-off-by: Jan Beulich <[email protected]>
> ---
> Should we skip re-evaluation when pci_mmcfg_arch_enable() takes its early
> exit path?

Possibly - we expect no change in that case.  However it would need
to propagate some extra information into the callers.  I could see
that as a followup optimization.

> The warning near the bottom of pci_check_extcfg() may be issued multiple
> times for a single device now. Should we try to avoid that?

Yeah, I've made some comments about that below.  Not sure how common
those broken bridges are, the comment just mentions two specific
models.  Adding yet another boolean to track that is cumbersome, and
what we would like to do is mark the bridge as broken, instead of
every device behind it.

> Note that no vPCI adjustments are done here, but they're going to be
> needed: Whatever requires extended capabilities will need re-
> evaluating / newly establishing / tearing down in case an invocation of
> PHYSDEVOP_pci_mmcfg_reserved alters global state.

Hm, you probably want to do something similar to re-scanning the
capability list, but avoid tearing down and re-setting the vPCI header
logic to prevent unneeded p2m manipulations.  We have no easy way to
preempt this rescanning from the context of a
PHYSDEVOP_pci_mmcfg_reserved call.

> Linux also has CONFIG_PCI_QUIRKS, allowing to compile out the slightly
> risky code (as reads may in principle have side effects). Should we gain
> such, too?

I would be fine with just a command line to disable the newly added
behavior in case it causes issues.

> ---
> v2: Major re-work to also check upon PHYSDEVOP_pci_mmcfg_reserved
>     invocation.
> 
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -22,6 +22,8 @@ int physdev_map_pirq(struct domain *d, i
>                       struct msi_info *msi);
>  int physdev_unmap_pirq(struct domain *d, int pirq);
>  
> +int cf_check physdev_check_pci_extcfg(struct pci_dev *pdev, void *arg);

I'm not sure why you need the forward declaration here, the function
(in this patch) is just used after it's already defined.

> +
>  #include "x86_64/mmconfig.h"
>  
>  #ifndef COMPAT
> @@ -160,6 +162,17 @@ int physdev_unmap_pirq(struct domain *d,
>  
>      return ret;
>  }
> +
> +int cf_check physdev_check_pci_extcfg(struct pci_dev *pdev, void *arg)

You can make this static I think?

> +{
> +    const struct physdev_pci_mmcfg_reserved *info = arg;
> +
> +    ASSERT(pdev->seg == info->segment);
> +    if ( pdev->bus >= info->start_bus && pdev->bus <= info->end_bus )
> +        pci_check_extcfg(pdev);
> +
> +    return 0;
> +}
>  #endif /* COMPAT */
>  
>  ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> @@ -511,6 +524,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>  
>          ret = pci_mmcfg_reserved(info.address, info.segment,
>                                   info.start_bus, info.end_bus, info.flags);
> +
> +        if ( !ret )
> +            ret = pci_segment_iterate(info.segment, physdev_check_pci_extcfg,
> +                                      &info);
> +
>          if ( !ret && has_vpci(currd) && (info.flags & 
> XEN_PCI_MMCFG_RESERVED) )
>          {
>              /*
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -422,6 +422,9 @@ static struct pci_dev *alloc_pdev(struct
>      }
>  
>      apply_quirks(pdev);
> +
> +    pci_check_extcfg(pdev);
> +
>      check_pdev(pdev);
>  
>      return pdev;
> @@ -718,6 +721,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>  
>                  list_add(&pdev->vf_list, &pf_pdev->vf_list);
>              }
> +
> +            if ( !pdev->ext_cfg )
> +                printk(XENLOG_WARNING
> +                       "%pp: VF without extended config space?\n",
> +                       &pdev->sbdf);

You possibly also want to check that the PF (pf_pdev in this context I
think) also has ext_cfg == true.

>          }
>      }
>  
> @@ -1041,6 +1049,75 @@ enum pdev_type pdev_type(u16 seg, u8 bus
>      return pos ? DEV_TYPE_PCIe_ENDPOINT : DEV_TYPE_PCI;
>  }
>  
> +void pci_check_extcfg(struct pci_dev *pdev)
> +{
> +    unsigned int pos, sig;
> +
> +    pdev->ext_cfg = false;

I think I would prefer if the ext_cfg field is only modified once Xen
know the correct value to put there.  It would also be nice to detect
cases where the device has pdev->ext_cfg == true but a new scan makes
it switch to false.  Which would signal something has likely gone very
wrong, and we should print a warning.

> +
> +    switch ( pdev->type )
> +    {
> +    case DEV_TYPE_PCIe_ENDPOINT:
> +    case DEV_TYPE_PCIe_BRIDGE:
> +    case DEV_TYPE_PCI_HOST_BRIDGE:
> +    case DEV_TYPE_PCIe2PCI_BRIDGE:
> +    case DEV_TYPE_PCI2PCIe_BRIDGE:
> +        break;
> +
> +    case DEV_TYPE_LEGACY_PCI_BRIDGE:
> +    case DEV_TYPE_PCI:
> +        pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_PCIX);
> +        if ( !pos ||
> +             !(pci_conf_read32(pdev->sbdf, pos + PCI_X_STATUS) &
> +               (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ)) )
> +            return;
> +        break;
> +
> +    default:
> +        return;
> +    }
> +
> +    /*
> +     * Regular PCI devices have 256 bytes, but PCI-X 2 and PCI Express 
> devices
> +     * have 4096 bytes.  Even if the device is capable, that doesn't mean we
> +     * can access it.  Maybe we don't have a way to generate extended config
> +     * space accesses, or the device is behind a reverse Express bridge.  So
> +     * we try reading the dword at PCI_CFG_SPACE_SIZE which must either be 0
> +     * or a valid extended capability header.
> +     */
> +    if ( pci_conf_read32(pdev->sbdf, PCI_CFG_SPACE_SIZE) == 0xffffffffU )
> +        return;
> +
> +    /*
> +     * PCI Express to PCI/PCI-X Bridge Specification, rev 1.0, 4.1.4 says 
> that
> +     * when forwarding a type1 configuration request the bridge must check
> +     * that the extended register address field is zero.  The bridge is not
> +     * permitted to forward the transactions and must handle it as an
> +     * Unsupported Request.  Some bridges do not follow this rule and simply
> +     * drop the extended register bits, resulting in the standard config 
> space
> +     * being aliased, every 256 bytes across the entire configuration space.
> +     * Test for this condition by comparing the first dword of each potential
> +     * alias to the vendor/device ID.
> +     * Known offenders:
> +     *   ASM1083/1085 PCIe-to-PCI Reversible Bridge (1b21:1080, rev 01 & 03)
> +     *   AMD/ATI SBx00 PCI to PCI Bridge (1002:4384, rev 40)
> +     */
> +    sig = pci_conf_read32(pdev->sbdf, PCI_VENDOR_ID);
> +    for ( pos = PCI_CFG_SPACE_SIZE;
> +          pos < PCI_CFG_SPACE_EXP_SIZE; pos += PCI_CFG_SPACE_SIZE )
> +        if ( pci_conf_read32(pdev->sbdf, pos) != sig )
> +            break;
> +
> +    if ( pos >= PCI_CFG_SPACE_EXP_SIZE )
> +    {
> +        printk(XENLOG_WARNING "%pp: extended config space aliases base 
> one\n",
> +               &pdev->sbdf);

Hm, I think this shouldn't be very common as it seems limited to a
short list of bridges.  However every device under such bridge would
be affected and repeatedly print the message.  I wonder whether we
should make this XENLOG_DEBUG instead, there isn't much the user can
do to fix it.  More a rant than a request though.

Thanks, Roger.

Reply via email to