>
> On Tue, 5 May 2026 13:44:50 -0700
> Long Li <[email protected]> wrote:
>
> > After PCI rescan on Azure, the MANA kernel driver can take over 100
> > seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> > The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12
> > seconds) was insufficient, causing VF re-attach to fail with 'Failed
> > to parse PCI device' on systems with slow MANA driver initialization.
> >
> > Replace the fixed retry limit with an indefinite retry that only gives
> > up when the PCI device itself disappears from sysfs. This is safe because:
> >
> > - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
> > interrupt thread, preventing races with VF remove or device close paths.
> > - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
> > alarms via rte_eal_alarm_cancel and frees the context.
> > - If the PCI device is removed while retrying, access() detects the
> > missing sysfs path and stops immediately.
> >
> > A periodic NOTICE log every 30 retries (~30s) provides visibility into
> > long waits without flooding the log at DEBUG level.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: [email protected]
> > Signed-off-by: Long Li <[email protected]>
> > ---
>
> Since this series touches lots of stuff, decided to do a more through than
> normal
> AI review
I have sent v2, with all the comments addressed.
Thanks,
Long
>
> Review of patch series: net/netvsc VF hotplug improvements (7 patches)
>
> Series targets stale interface names, slow MANA driver probe, multi-VF PCI
> devices, and adds service-reset event handling. Reviewed against upstream main
> as of clone date.
>
> =================================================================
> ========
> [PATCH 3/7] net/netvsc: retry full probe when IB device not ready during
> hotplug
> =================================================================
> ========
>
> Warning:
>
> drivers/net/netvsc/hn_ethdev.c (around the new ENODEV/EEXIST handling)
>
> The original code logged failures from rte_eal_hotplug_add at
> PMD_DRV_LOG(ERR). The patch lowers all non-ENODEV, non-EEXIST returns
> to PMD_DRV_LOG(NOTICE):
>
> if (ret && ret != -EEXIST)
> PMD_DRV_LOG(NOTICE,
> "Failed to add PCI device %s (ret=%d)",
> d->name, ret);
>
> -ENOMEM, -EBUSY, -EIO and similar errors from rte_eal_hotplug_add are
> legitimate failures that operators need to see. Recommend keeping ERR
> for unrecognized error codes; only -ENODEV and -EEXIST are now expected
> outcomes worth a quieter log level.
>
>
> =================================================================
> ========
> [PATCH 4/7] net/netvsc: add NOTICE-level debug logging for VF hotplug
> retry
> =================================================================
> ========
>
> Warning:
>
> drivers/net/netvsc/hn_ethdev.c, multiple sites in netvsc_hotplug_retry
>
> The new log calls inside the readdir() loop fire at NOTICE for every
> directory entry on every retry pass:
>
> PMD_DRV_LOG(NOTICE, "%s: checking interface %s in %s (retry %d)", ...)
> PMD_DRV_LOG(NOTICE, "%s: device %s sa_family=%d not
> ARPHRD_ETHER, ...")
> PMD_DRV_LOG(NOTICE, "%s: MAC mismatch for %s: got ... expected ...")
>
> Combined with patch 1 (indefinite retries) and patch 5 (up to 120
> mac_retry passes), a multi-NIC VM can produce hundreds of NOTICE lines
> per VF re-attach. Per-iteration trace belongs at DEBUG; reserve NOTICE
> for one-shot state transitions (loop start, match found, give-up).
>
> Info:
>
> drivers/net/netvsc/hn_ethdev.c, MAC mismatch log
>
> The format string and byte arguments expand the MAC inline:
>
> "%02x:%02x:%02x:%02x:%02x:%02x ..."
> eth_addr.addr_bytes[0], ..., eth_addr.addr_bytes[5],
> dev->data->mac_addrs->addr_bytes[0], ...
>
> rte_ether.h provides RTE_ETHER_ADDR_PRT_FMT and
> RTE_ETHER_ADDR_BYTES()
> for exactly this; using them is shorter and consistent with the rest
> of DPDK.
>
> Info:
>
> drivers/net/netvsc/hn_ethdev.c, MAC match block
>
> The new "} else {" follows an if-branch that already ends in break, so
> the else is redundant — the log can sit unindented after the closing
> brace of the if.
>
>
> =================================================================
> ========
> [PATCH 5/7] net/netvsc: retry when no matching MAC found in net directory
> =================================================================
> ========
>
> Error:
>
> Commit message vs. code disagree on the retry budget.
>
> Commit message:
> "This uses a separate mac_retry counter (limit 30, ~30 seconds) so
> the main retry loop remains unlimited."
>
> Code:
> /* Max retries when net/ directory exists but no matching MAC found.
> * On multi-NIC PCI devices, a second VF may register later.
> * 120 retries = ~2 minutes.
> */
> #define NETVSC_MAX_MAC_RETRY 120
>
> Either the commit message or the macro value needs to be corrected so
> they agree. With NETVSC_HOTADD_RETRY_INTERVAL = 1s, 120 retries is
> ~2 min, not ~30 s.
>
> Info:
>
> drivers/net/netvsc/hn_ethdev.c, in the new "no MAC match" block
>
> if (di) {
> closedir(di);
> di = NULL;
>
> DPDK style requires explicit pointer comparison: "if (di != NULL)".
>
>
> =================================================================
> ========
> [PATCH 6/7] net/netvsc: forward per-queue stats from VF device
> =================================================================
> ========
>
> Warning:
>
> drivers/net/netvsc/hn_vf.c, hn_vf_stats_get
>
> if (vf_dev->dev_ops->stats_get)
> ret = vf_dev->dev_ops->stats_get(vf_dev, stats, qstats);
> else
> ret = rte_eth_stats_get(vf_dev->data->port_id, stats);
>
> Reaching directly into another ethdev's dev_ops->stats_get is not a
> pattern used anywhere else in DPDK (bonding, failsafe, etc. all go
> through rte_eth_stats_get). The shortcut bypasses three things the
> public wrapper does:
>
> - RTE_ETH_VALID_PORTID_OR_ERR_RET on the VF port_id
> - memset(stats, 0) and stats->rx_nombuf = data->rx_mbuf_alloc_failed
> - eth_err() return-code translation
>
> The memset is harmless here because the outer ethdev wrapper has
> already zeroed stats/qstats for the netvsc port. The rx_nombuf
> attribution, however, changes: previously stats->rx_nombuf was
> pre-loaded with the VF's data->rx_mbuf_alloc_failed before the VF's
> callback ran; now it's whatever was set for the netvsc port. Worth
> documenting in the commit message if intentional.
>
> Also, the dev_ops pointer test uses an implicit comparison:
>
> if (vf_dev->dev_ops->stats_get)
>
> Per DPDK style this should be "!= NULL".
>
> A cleaner long-term fix would be to extend the ethdev API so per-queue
> stats can be forwarded through the public path, or wrap this dispatch
> in a small helper with a comment explaining why the public API is
> insufficient.
>
> Info:
>
> drivers/net/netvsc/hn_vf.c
>
> Some VF drivers may return -ENOTSUP or partial data when called with a
> non-NULL qstats. The patch unconditionally propagates that return,
> which could turn a previously-successful stats_get into a failure for
> netvsc users on certain VF drivers. A graceful fallback to
> rte_eth_stats_get on -ENOTSUP would preserve backward compatibility.
>
>
> =================================================================
> ========
> [PATCH 7/7] net/netvsc: handle VF recovery events for service reset
> =================================================================
> ========
>
> Warning:
>
> drivers/net/netvsc/hn_vf.c, hn_eth_recovery_success_callback
>
> rte_rwlock_write_lock(&hv->vf_lock);
> if (hv->vf_ctx.vf_attached && !hv->vf_ctx.vf_vsc_switched) {
> ret = hn_nvs_set_datapath(hv, NVS_DATAPATH_VF);
> ...
> }
>
> hn_vf_add_unlocked guards the equivalent NVS_DATAPATH_VF switch with
> "if (dev->data->dev_started)" precisely to avoid routing traffic to
> the VF before queues are configured. The new callback omits that
> check. If the user calls rte_eth_dev_stop() on the netvsc port during
> the ERR_RECOVERING -> RECOVERY_SUCCESS window, this callback will
> switch the host data path to a VF whose DPDK queues may have been
> torn down. Recommend mirroring the dev_started check.
>
> Warning:
>
> drivers/net/netvsc/hn_vf.c, hn_eth_recovering_callback and
> hn_eth_recovery_success_callback
>
> Both new callbacks acquire hv->vf_lock as a writer directly in the
> event-callback context. The existing hn_eth_rmv_event_callback
> intentionally does not take vf_lock — it defers via:
>
> rte_eal_alarm_set(1, hn_remove_delayed, hv);
>
> to break a possible lock-order coupling with the calling driver (the
> PMD that fires rte_eth_dev_callback_process may itself be holding an
> internal lock). Taking vf_lock directly inside the recovery callbacks
> introduces a different lock-ordering invariant from the rest of the
> file. Consider either deferring through rte_eal_alarm_set as well, or
> adding a comment explaining why direct acquisition is safe in the
> MANA -> netvsc invocation path.
>
> Info:
>
> drivers/net/netvsc/hn_vf.c, hn_eth_recovery_failed_callback
>
> rte_eal_alarm_set(1, hn_remove_delayed, hv);
>
> The state of hv->vf_ctx.vf_attached is not checked. If RECOVERY_FAILED
> arrives after a concurrent INTR_RMV has already detached the VF, the
> alarm will fire hn_remove_delayed against an already-removed port,
> generating a spurious "Start to remove port 0" log and downstream
> unregister failures. The damage is limited because hn_remove_delayed
> takes the lock and re-checks state, but the noise is avoidable with a
> vf_attached guard before scheduling the alarm.
>
>
> =================================================================
> ========
> General notes on the series
> =================================================================
> ========
>
> Patches 1, 2, and 5 together turn netvsc_hotplug_retry into a genuinely-
> unbounded loop in two dimensions:
>
> - opendir failures retry indefinitely (patch 1, capped only by PCI
> device disappearance)
> - SIOCGIFHWADDR failures and -ENODEV from rte_eal_hotplug_add reschedule
> without incrementing any counter (patches 2, 3)
> - "no matching MAC" retries are bounded by mac_retry / 120 (patch 5)
>
> Only patch 5's path has a hard upper bound. The other paths rely on the PCI
> device eventually disappearing if the VF never comes up. On a permanently
> broken VF that stays present in sysfs but never registers a netdev, the EAL
> alarm
> thread will spin forever. Worth confirming that this is the intended
> behaviour, or
> adding an overall ceiling (e.g., a much higher cap than 30s but still finite)
> for the
> opendir / ioctl / ENODEV paths.