> On May 10, 2019, at 7:36 AM, Viacheslav Ovsiienko <viachesl...@mellanox.com> > wrote: > > IBV_EVENT_DEVICE_FATAL event is generated by the driver once for > the entire multiport Infiniband device, not for each existing ports. > The port index is zero and it causes dropping the device removal > event. We should invoke the removal event processing routine > for each port we have installed handler for. > > Fixes: 028b2a28c3cb ("net/mlx5: update event handler for multiport IB > devices") > > Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > --- > drivers/net/mlx5/mlx5_ethdev.c | 43 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 42 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 80ee98f..3962d41 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -1116,6 +1116,35 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char > *fw_ver, size_t fw_size) > } > > /** > + * Handle asynchronous removal event for entire multiport device. > + * > + * @param sh > + * Infiniband device shared context. > + */ > +static void > +mlx5_dev_interrupt_device_fatal(struct mlx5_ibv_shared *sh) > +{ > + uint32_t i; > + > + for (i = 0; i < sh->max_port; ++i) { > + struct rte_eth_dev *dev; > + > + if (sh->port[i].ih_port_id >= RTE_MAX_ETHPORTS) { > + /* > + * Or not existing port either no > + * handler installed for this port. > + */ > + continue; > + } > + dev = &rte_eth_devices[sh->port[i].ih_port_id]; > + assert(dev); > + if (dev->data->dev_conf.intr_conf.rmv) > + _rte_eth_dev_callback_process > + (dev, RTE_ETH_EVENT_INTR_RMV, NULL); > + } > +} > + > +/** > * Handle shared asynchronous events the NIC (removal event > * and link status change). Supports multiport IB device. > * > @@ -1137,6 +1166,16 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char > *fw_ver, size_t fw_size) > break; > /* Retrieve and check IB port index. */ > tmp = (uint32_t)event.element.port_num; > + if (!tmp && event.event_type == IBV_EVENT_DEVICE_FATAL) { > + /* > + * The DEVICE_FATAL event is called once for > + * entire device without port specifying. > + * We should notify all existing ports. > + */ > + mlx5_glue->ack_async_event(&event); > + mlx5_dev_interrupt_device_fatal(sh); > + continue; > + }
Then, you should clear the previous handle below, right? I mean this if-clause. if (event.event_type == IBV_EVENT_DEVICE_FATAL && > assert(tmp && (tmp <= sh->max_port)); > if (!tmp || > tmp > sh->max_port || > @@ -1146,12 +1185,14 @@ int mlx5_fw_version_get(struct rte_eth_dev *dev, char > *fw_ver, size_t fw_size) > * installed for this port. > */ > mlx5_glue->ack_async_event(&event); > + DRV_LOG(DEBUG, > + "port %u event type %d on not handled", Needs to fix the sentence. Be more specific. For example, port %u cannot handle an event (type %d) due to invalid IB port index One more below. That should be something like port %u cannot handle an unknown event (type %d) Thanks, Yongseok > + tmp, event.event_type); > continue; > } > /* Retrieve ethernet device descriptor. */ > tmp = sh->port[tmp - 1].ih_port_id; > dev = &rte_eth_devices[tmp]; > - tmp = 0; > assert(dev); > if ((event.event_type == IBV_EVENT_PORT_ACTIVE || > event.event_type == IBV_EVENT_PORT_ERR) && > -- > 1.8.3.1 >