On Mon, Mar 23, 2026 at 11:36:18AM +0100, Marc Kleine-Budde wrote:
> Hello,
> 

Hello and thanks, I am struggling to come up with a solution for this
comment.

> The nipa/ai-review bot found these problems:
> 
> | 
> https://netdev-ai.bots.linux.dev/ai-review.html?id=b046d75a-2de1-4063-bb96-8f64abc65e7d
> 
> commit 0f3cb3b3bc94eaff9b9709b528e498fe6150f9c1
> Author: Matias Ezequiel Vara Larsen <[email protected]>
> 
> can: virtio: Add virtio CAN driver
> 
> This commit adds a virtio CAN driver implementing a complete CAN bus
> interface over Virtio transport, supporting both CAN Classic and CAN-FD
> frames.
> 
> > diff --git a/drivers/net/can/virtio_can.c b/drivers/net/can/virtio_can.c
> > new file mode 100644
> > index 000000000000..57641c3ed6d0
> > --- /dev/null
> > +++ b/drivers/net/can/virtio_can.c
> 
> [ ... ]
> 
> > +static int __maybe_unused virtio_can_restore(struct virtio_device *vdev)
> > +{
> > +   struct virtio_can_priv *priv = vdev->priv;
> > +   struct net_device *ndev = priv->dev;
> > +   size_t size;
> > +   int err;
> > +
> > +   napi_enable(&priv->napi);
> > +   napi_enable(&priv->napi_tx);
> > +
> > +   err = virtio_can_find_vqs(priv);
> > +   if (err != 0)
> > +           return err;
> 
> What happens to the enabled NAPI instances when this error path is taken?
> virtio_can_restore() enables both NAPI instances at the start, but if
> virtio_can_find_vqs() fails here, the function returns without calling
> napi_disable().
> 
I was expecting that, if virtio_can_restore() fails, then a successive
virtio_can_remove() will not hang in napi_disable() since the napis were
already enabled.  

> The corresponding virtio_can_freeze() function disables both NAPIs before
> returning, so the restore function leaves the NAPI subsystem in an
> inconsistent state on failure.
> 
> If the PM framework retries restore after this failure, the second
> napi_enable() call on an already-enabled NAPI will hit the BUG_ON in
> napi_enable_locked():
>

Does the PM framework really retries if virtio_can_restore() fails? If not, 
that BUG_ON
is never reached.

> net/core/dev.c:napi_enable_locked() {
>       BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
>       ...
> }
> 
> In contrast, virtio_can_probe() enables NAPI only after all setup
> succeeds, avoiding this issue.
> 
> > +
> > +   size = priv->rpkt_len * priv->vqs[VIRTIO_CAN_QUEUE_RX]->num_free;
> > +   priv->rpkt = devm_krealloc(&vdev->dev, priv->rpkt, size, GFP_KERNEL | 
> > __GFP_ZERO);
> > +   if (!priv->rpkt) {
> > +           virtio_can_del_vq(vdev);
> > +           return -ENOMEM;
> > +   }
> 
> The same NAPI state issue occurs on this error path.
> 
> > +   virtio_can_populate_rx_vq(vdev);
> > +
> > +   if (netif_running(ndev)) {
> > +           err = virtio_can_start(ndev);
> > +           if (err) {
> > +                   virtio_can_del_vq(vdev);
> > +                   return err;
> > +           }
> 
> And again on this error path.
> 
> Should all three error paths call napi_disable(&priv->napi_tx) and
> napi_disable(&priv->napi) before returning the error code?
> 
> -- 
> Pengutronix e.K.                 | Marc Kleine-Budde          |
> Embedded Linux                   | https://www.pengutronix.de |
> Vertretung Nürnberg              | Phone: +49-5121-206917-129 |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-9   |



Reply via email to