On Mon, May 11, 2026 at 07:21:33PM -0700, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: mana: force full-page RX buffers via ethtool private flag
>
> This patch adds an ethtool private flag "full-page-rx" that forces one
> RX buffer per page, bypassing the page_pool fragment path, to address
> a 15-20% throughput regression seen on some 4K PAGE_SIZE ARM64
> platforms under high connection counts. Toggling the flag triggers
> mana_detach() + mana_attach() to apply the new RX buffer layout.
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > index a28ca461c135..dfd222de11d0 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>
> [ ... ]
>
> > @@ -590,6 +609,88 @@ static int mana_get_link_ksettings(struct net_device
> > *ndev,
> > return 0;
> > }
> >
>
> [ ... ]
>
> > +static int mana_set_priv_flags(struct net_device *ndev, u32 priv_flags)
> > +{
> > + struct mana_port_context *apc = netdev_priv(ndev);
> > + u32 changed = apc->priv_flags ^ priv_flags;
> > + u32 old_priv_flags = apc->priv_flags;
> > + bool schedule_port_reset = false;
> > + int err = 0;
> > +
> > + if (!changed)
> > + return 0;
> > +
> > + /* Reject unknown bits */
> > + if (priv_flags & ~GENMASK(MANA_PRIV_FLAG_MAX - 1, 0))
> > + return -EINVAL;
> > +
> > + if (changed & BIT(MANA_PRIV_FLAG_USE_FULL_PAGE_RXBUF)) {
> > + apc->priv_flags = priv_flags;
> > +
> > + if (!apc->port_is_up) {
> > + /* Port is down, flag updated to apply on next up
> > + * so just return.
> > + */
> > + return 0;
> > + }
> > +
> > + /* Pre-allocate buffers to prevent failure in mana_attach
> > + * later
> > + */
> > + err = mana_pre_alloc_rxbufs(apc, ndev->mtu, apc->num_queues);
> > + if (err) {
> > + netdev_err(ndev,
> > + "Insufficient memory for new allocations\n");
> > + apc->priv_flags = old_priv_flags;
> > + return err;
> > + }
> > +
> > + err = mana_detach(ndev, false);
> > + if (err) {
> > + netdev_err(ndev, "mana_detach failed: %d\n", err);
> > + apc->priv_flags = old_priv_flags;
> > +
> > + /* Port is in an inconsistent state. Restore
> > + * 'port_is_up' so that queue reset work handler
> > + * can properly detach and re-attach.
> > + */
> > + apc->port_is_up = true;
> > + schedule_port_reset = true;
> > + goto out;
> > + }
> > +
> > + err = mana_attach(ndev);
> > + if (err) {
> > + netdev_err(ndev, "mana_attach failed: %d\n", err);
> > + apc->priv_flags = old_priv_flags;
> > +
> > + /* Restore 'port_is_up' so the reset work handler
> > + * can properly detach/attach. Without this,
> > + * the handler sees port_is_up=false and skips
> > + * queue allocation, leaving the port dead.
> > + */
> > + apc->port_is_up = true;
> > + schedule_port_reset = true;
> > + }
> > + }
> > +
> > +out:
> > + mana_pre_dealloc_rxbufs(apc);
> > +
> > + if (schedule_port_reset)
> > + queue_work(apc->ac->per_port_queue_reset_wq,
> > + &apc->queue_reset_work);
> > +
> > + return err;
> > +}
>
> [High]
> Can this failure path oops inside the reset work handler?
>
> When mana_attach() fails after mana_detach() has already torn the vport
> down, apc->tx_qp has been set to NULL by mana_destroy_txq() and apc->rxqs
> has been cleared by mana_cleanup_port_context(). This code then forces
> apc->port_is_up back to true and queues apc->queue_reset_work:
>
> apc->port_is_up = true;
> schedule_port_reset = true;
>
> When mana_per_port_queue_reset_work_handler runs it calls
> mana_detach(ndev, false), which latches port_st_save = port_is_up = true
> and, because port_st_save is true, invokes mana_dealloc_queues(). The TX
> drain loop there does:
>
> txq = &apc->tx_qp[i].txq;
> atomic_read(&txq->pending_sends);
>
> with no NULL check on apc->tx_qp, and mana_destroy_vport further down
> iterates apc->rxqs[rxq_idx] while rxqs is NULL. Would that produce a
> NULL pointer dereference in the workqueue worker whenever mana_attach()
> fails here (for example on an allocation failure in
> mana_init_port_context or mana_create_txq, or a transient HW command
> failure in mana_query_vport_cfg / mana_cfg_vport)?
>
> The mana_detach() failure path above (goto out) reaches
> mana_dealloc_queues() through the same chain and looks to have the same
> exposure.
Hi Jakub,
Thanks for your comments, I have sent a separate fixes series to the
net tree that fixes the possible NULL pointer derefernce issue and also
makes the reset handler safe for the case where it runs after a failed
attach:
https://lore.kernel.org/all/[email protected]/
>
> For comparison, mana_change_mtu() handles a mana_attach() failure by
> returning the error without scheduling a reset. Would a similar
> treatment here avoid the asynchronous oops, or is there a reason the
> reset must be scheduled in this specific failure case?
> --
> pw-bot: cr
The full-page-rx private flag is intended to be driven by a udev rule
that fires automatically during VM provisioning on affected platforms.
If there is a transient failure, the VM fails to provision, requiring manual
intervention.The reset handler retries the attach, giving the port a
chance to recover to default config autonomously without intervention.
Regards
Dipayaan Roy