On Thu Feb 5, 2026 at 2:59 PM GMT, Dariusz Sosnowski wrote: > Thank you for reporting the issue and the patch. > Please see comments below.
Responses inline > > On Sat, Jan 10, 2026 at 11:15:10PM +0000, Max Tottenham wrote: > > When probing an SF as a vDPA device, mlx5_roce_disable() targets the > > parent PF address (via mlx5_dev_to_pci_str). This incorrectly attempts > > to disable ROCE on the parent PF rather than the SF itself. > > > > This causes vDPA probe failures when the parent PF already has an open > > IB context (e.g., probed for uplink ports or SF representors). > > > > For SubFunctions, ROCE is configured via devlink parameters > > (enable_roce) before device creation. Skip the runtime ROCE disable > > for auxiliary devices since the devlink configuration is already in > > effect and targeting the parent PF is incorrect. > > > > Signed-off-by: Max Tottenham <[email protected]> > > --- > > drivers/common/mlx5/linux/mlx5_common_os.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c > > b/drivers/common/mlx5/linux/mlx5_common_os.c > > index 2867e21618..6fa12e06eb 100644 > > --- a/drivers/common/mlx5/linux/mlx5_common_os.c > > +++ b/drivers/common/mlx5/linux/mlx5_common_os.c > > @@ -690,6 +690,19 @@ mlx5_roce_disable(const struct rte_device *dev) > > { > > char pci_addr[PCI_PRI_STR_SIZE] = { 0 }; > > > > + /* > > + * For auxiliary devices (SFs), ROCE is configured via devlink > > + * parameters (enable_roce) before device creation. Skip runtime > > + * ROCE disable since mlx5_dev_to_pci_str() returns the parent PF > > + * address, not the SF - disabling ROCE on the parent PF is both > > + * incorrect and may fail if the PF already has an active IB context. > > + */ > > + if (!mlx5_dev_is_pci(dev)) { > > + DRV_LOG(INFO, "Skipping ROCE disable for auxiliary device > > \"%s\"", > > + dev->name); > > + return 0; > > + } > > The logic has a bug as you mentioned, but I don't think > it would be a good idea to not disable ROCE automatically for SFs. > Especially since, IIUC, enable_roce option value is not inherited > from PF when new SF is created and probed. > It'll move more responsibility to the user regarding > port configuration. Ok. I can see arguments for/against here - as I think that for SFs, my recollection is that you are already required to disable ROCE if not using the hotplug API. But it's a small enough change to attempt a disable here. > > In my opinion mlx5_roce_disable() should have split logic like so: > > if mlx5_dev_is_pci(dev) > // for PCI devices continue as usual: > // try disabling ROCE through netlink or sysfs > else > // for SFs: try disabling ROCE through netlink > > This would require some adjustments in mlx5_nl_roce_disable() > and related code. > Specifically, devlink attributes should be adjusted when disabling ROCE > for auxiliary devices: > > - DEVLINK_ATTR_BUS_NAME = "auxiliary" > - DEVLINK_ATTR_DEV_NAME = device name from rte_device->name > > Would you be able to make the necessary changes? > Sure. I also found a few other SF hotplug related bugs I'll send along with the V2. > > + > > if (mlx5_dev_to_pci_str(dev, pci_addr, sizeof(pci_addr)) < 0) > > return -rte_errno; > > /* Firstly try to disable ROCE by Netlink and fallback to sysfs. */ > > -- > > 2.51.2 > > > > Best regards, > Dariusz Sosnowski

