Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

Pre-existing issues:
- [High] State Machine Violation / NULL Pointer Dereference on interface open 
after a failed MTU change.
- [High] Uninitialized Resource Teardown / Invalid Hardware Request during TX 
queue creation failure.
--

commit effecc83d343cb89cadf93984f0a62ac828d0166
Author: Dipayaan Roy <[email protected]>

net: mana: Add NULL guards in teardown path to prevent panic on attach failure

When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as
mana_remove(), mana_change_mtu() recovery, and internal error handling
in mana_alloc_queues() can subsequently call into functions that
dereference these pointers without NULL checks.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c 
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a8..9e7e4bf526bf6 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1727,6 +1727,9 @@ static void mana_fence_rqs(struct mana_port_context 
> *apc)
>       struct mana_rxq *rxq;
>       int err;
>  
> +     if (!apc->rxqs)
> +             return;
> +

[Severity: High]
This isn't a bug introduced by this patch, but if mana_change_mtu() fails,
does it leave the device in a state where a subsequent mana_open() will crash?

When mana_change_mtu() calls mana_detach(ndev, false), the device is detached
and mana_cleanup_port_context() is explicitly called, freeing apc->rxqs and
setting it to NULL.

It then calls mana_attach(). If mana_attach() fails internally (e.g., inside
its call to mana_alloc_queues()), its error path calls
mana_cleanup_port_context() again, which frees apc->rxqs and sets it to NULL.

mana_change_mtu() then returns the error.

The interface is now in the DOWN state (apc->port_is_up == false) with
apc->rxqs == NULL.

If the user subsequently attempts to bring the interface up (ip link set up),
mana_open() is executed. mana_open() calls mana_alloc_queues() but does not
call mana_init_port() to reallocate apc->rxqs.

Inside mana_alloc_queues(), mana_add_rx_queues() unconditionally dereferences
apc->rxqs:

    apc->rxqs[i] = rxq;

Could this result in a kernel panic?

[ ... ]

> @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device 
> *ndev)
>        * number of queues.
>        */
>  
> -     for (i = 0; i < apc->num_queues; i++) {
> -             txq = &apc->tx_qp[i].txq;
> -             tsleep = 1000;
> -             while (atomic_read(&txq->pending_sends) > 0 &&
> -                    time_before(jiffies, timeout)) {
> -                     usleep_range(tsleep, tsleep + 1000);
> -                     tsleep <<= 1;
> -             }
> -             if (atomic_read(&txq->pending_sends)) {
> -                     err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> -                     if (err) {
> -                             netdev_err(ndev, "flr failed %d with %d pkts 
> pending in txq %u\n",
> -                                        err, 
> atomic_read(&txq->pending_sends),
> -                                        txq->gdma_txq_id);
> +     if (apc->tx_qp) {
> +             for (i = 0; i < apc->num_queues; i++) {
> +                     txq = &apc->tx_qp[i].txq;
> +                     tsleep = 1000;
> +                     while (atomic_read(&txq->pending_sends) > 0 &&
> +                            time_before(jiffies, timeout)) {
> +                             usleep_range(tsleep, tsleep + 1000);
> +                             tsleep <<= 1;
> +                     }

[Severity: High]
This is a pre-existing issue, but does mana_destroy_txq() send invalid handles
to the hardware when TX queue creation fails?

In mana_create_txq(), the apc->tx_qp array is allocated via kzalloc_objs(),
meaning all elements (including tx_object) are initialized to 0.

The function then loops over num_queues. For the current queue i, it sets
apc->tx_qp[i].tx_object = INVALID_MANA_HANDLE; before attempting to create the
hardware object via mana_create_wq_obj().

If creation fails, it jumps to the out label and calls mana_destroy_txq(apc).
mana_destroy_txq() unconditionally loops over all num_queues and calls:

    mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);

Unlike mana_destroy_rxq(), which checks if (rxq->rxobj != INVALID_MANA_HANDLE),
mana_destroy_txq() passes the handle directly to mana_destroy_wq_obj().

For the failed queue, the handle is INVALID_MANA_HANDLE. For all unreached
queues (j > i), the handle is 0.

Could this cause mana_destroy_wq_obj() to send invalid handles (including 0,
which might be a valid handle for an unrelated object) to the hardware?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to