Hi Stephan, Thursday, August 2, 2018 1:00 AM, Stephen Hemminger: > Subject: [RFC] mlx5: fix error unwind in device start > > The error handling in start of the mlx5 driver is buggy. > For example, if setting up the flows fails the device driver will then get > stuck > in mlx5_flow_rxq_flags_clear waiting for something that will never happen.
Looking at the code I cannot understand why the mlx5_flow_rxq_flags_clear get stuck nor to what it waits. The function has few finite loops which are not depended in anything which happened before it at the device start. Moreover I tried to force either the mlx5_traffic_enable or the mlx5_flow_start to stop, however the results was the port failed to start but no stuck. Can you provide more details about the issue you saw there? > > The problem is that the code jumps to a common error label and does > unwind for portions of the driver which have not been setup. > > This suggested patch breaks it into different labels with each failure path > only > unwinding what was done. > > Also, the ethdev driver should not be manipulating the dev_started flag > directly. That is handled by the common ethdev layer. > I agree that maybe this code part can be better written, but my question before is whether we have an actual bug that we will solve w/ this change? > The patch works for the success case, but furthur testing is needed to > actually exercise all the error paths. > This is left as exercise for the maintainers. > > Signed-off-by: Stephen Hemminger <sthem...@microsoft.com> > --- > drivers/net/mlx5/mlx5_trigger.c | 26 +++++++++++++------------- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/drivers/net/mlx5/mlx5_trigger.c > b/drivers/net/mlx5/mlx5_trigger.c index e2a9bb703261..79a7b233986a > 100644 > --- a/drivers/net/mlx5/mlx5_trigger.c > +++ b/drivers/net/mlx5/mlx5_trigger.c > @@ -171,42 +171,42 @@ mlx5_dev_start(struct rte_eth_dev *dev) > if (ret) { > DRV_LOG(ERR, "port %u Rx queue allocation failed: %s", > dev->data->port_id, strerror(rte_errno)); > - mlx5_txq_stop(dev); > - return -rte_errno; > + goto error_txq_stop; > } > - dev->data->dev_started = 1; > + > ret = mlx5_rx_intr_vec_enable(dev); > if (ret) { > DRV_LOG(ERR, "port %u Rx interrupt vector creation failed", > dev->data->port_id); > - goto error; > + goto error_rxq_stop; > } > mlx5_xstats_init(dev); > ret = mlx5_traffic_enable(dev); > if (ret) { > DRV_LOG(DEBUG, "port %u failed to set defaults flows", > dev->data->port_id); > - goto error; > + goto error_intr_vec_disable; > } > ret = mlx5_flow_start(dev, &priv->flows); > if (ret) { > DRV_LOG(DEBUG, "port %u failed to set flows", > dev->data->port_id); > - goto error; > + goto error_traffic_disable; > } > + > dev->tx_pkt_burst = mlx5_select_tx_function(dev); > dev->rx_pkt_burst = mlx5_select_rx_function(dev); > mlx5_dev_interrupt_handler_install(dev); > return 0; > -error: > - ret = rte_errno; /* Save rte_errno before cleanup. */ > - /* Rollback. */ > - dev->data->dev_started = 0; > - mlx5_flow_stop(dev, &priv->flows); > + > +error_traffic_disable: > mlx5_traffic_disable(dev); > - mlx5_txq_stop(dev); > +error_intr_vec_disable: > + mlx5_rx_intr_vec_disable(dev); > +error_rxq_stop: > mlx5_rxq_stop(dev); > - rte_errno = ret; /* Restore rte_errno. */ > +error_txq_stop: > + mlx5_txq_stop(dev); > return -rte_errno; > } > > -- > 2.18.0