On Sun, 23 Nov 2025 10:08:18 -0800 Dipayaan Roy wrote:
> Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
> and a device-controlled port reset for all queues can be scheduled to a
> ordered workqueue. The reset for all queues on stall detection is
> recomended by hardware team.
> 
> The change introduces a single ordered workqueue
> "mana_per_port_queue_reset_wq" queuing one work_struct per port,
> using WQ_UNBOUND | WQ_MEM_RECLAIM so stalled queue reset work can
> run on any CPU and still make forward progress under memory
> pressure.

And we need to be able to reset the NIC queue under memory pressure
because.. ?  I could be wrong but I still find this unusual / defensive
programming, if you could point me at some existing drivers that'd help.

> @@ -3287,6 +3341,7 @@ static int mana_probe_port(struct mana_context *ac, int 
> port_idx,
>       ndev->min_mtu = ETH_MIN_MTU;
>       ndev->needed_headroom = MANA_HEADROOM;
>       ndev->dev_port = port_idx;
> +     ndev->watchdog_timeo = 15 * HZ;

5 sec is typical, off the top of my head

> @@ -3647,6 +3717,11 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
>               free_netdev(ndev);
>       }
>  
> +     if (ac->per_port_queue_reset_wq) {
> +             destroy_workqueue(ac->per_port_queue_reset_wq);
> +             ac->per_port_queue_reset_wq = NULL;
> +     }

I think you're missing this cleanup in the failure path of mana_probe
-- 
pw-bot: cr

Reply via email to