Hi, > -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Michael Baum > Sent: Tuesday, February 4, 2020 3:36 PM > To: dev@dpdk.org > Cc: Matan Azrad <ma...@mellanox.com>; Slava Ovsiienko > <viachesl...@mellanox.com>; sta...@dpdk.org > Subject: [dpdk-dev] [PATCH v2] net/mlx5: fix memory regions release > deadlock > > The mpx5 PMD maintains the list of devices for those the memory > operation callback routines must be invoked to keep the device MRs (MR > is the entity backing the hardware DMA transactions) consistent with the > mapped memory. > Each device context in the list is protected with dedicated lock on per > device basis, which might be taken inside the callback routine. > > When device is closing the PMD frees all MRs by calling > mlx5_mr_release(), that might call rte_free() under the taken device > lock. If this rte_free call triggers the entire memory segment freeing > it, in its turn, invokes the callback routine and attempt to take the > lock inside this one causes the deadlock. > > The patch proposes the remove the device from the callback list first > and then call mlx5_mr_release() and free the remaining device MRs > explicitely. > > Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list") > Cc: sta...@dpdk.org > > Signed-off-by: Michael Baum <michae...@mellanox.com> > Acked-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > Acked-by: Matan Azrad <ma...@mellanox.com> > --- > > v2: > rephrase commit masage. > > > drivers/net/mlx5/mlx5.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c > index f80e403..759491f 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -679,12 +679,12 @@ struct mlx5_flow_id_pool * > MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY); > if (--sh->refcnt) > goto exit; > - /* Release created Memory Regions. */ > - mlx5_mr_release(sh); > /* Remove from memory callback device list. */ > rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock); > LIST_REMOVE(sh, mem_event_cb); > rte_rwlock_write_unlock(&mlx5_shared_data- > >mem_event_rwlock); > + /* Release created Memory Regions. */ > + mlx5_mr_release(sh); > /* Remove context from the global device list. */ > LIST_REMOVE(sh, next); > /* > -- > 1.8.3.1
Fixed typo in commit msg, Patch applied to next-net-mlx, Kindest regards, Raslan Darawsheh