Hi,

> -----Original Message-----
> From: dev <dev-boun...@dpdk.org> On Behalf Of Michael Baum
> Sent: Tuesday, February 4, 2020 3:36 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <ma...@mellanox.com>; Slava Ovsiienko
> <viachesl...@mellanox.com>; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH v2] net/mlx5: fix memory regions release
> deadlock
> 
> The mpx5 PMD maintains the list of devices for those the memory
> operation callback routines must be invoked to keep the device MRs (MR
> is the entity backing the hardware DMA transactions) consistent with the
> mapped memory.
> Each device context in the list is protected with dedicated lock on per
> device basis, which might be taken inside the callback routine.
> 
> When device is closing the PMD frees all MRs by calling
> mlx5_mr_release(), that might call rte_free() under the taken device
> lock.  If this rte_free call triggers the entire memory segment freeing
> it, in its turn, invokes the callback routine and attempt to take the
> lock inside this one causes the deadlock.
> 
> The patch proposes the remove the device from the callback list first
> and then call mlx5_mr_release() and free the remaining device MRs
> explicitely.
> 
> Fixes: 0e3d0525b2f2 ("net/mlx5: fix memory event callback list")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Michael Baum <michae...@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viachesl...@mellanox.com>
> Acked-by: Matan Azrad <ma...@mellanox.com>
> ---
> 
> v2:
> rephrase commit masage.
> 
> 
>  drivers/net/mlx5/mlx5.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index f80e403..759491f 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -679,12 +679,12 @@ struct mlx5_flow_id_pool *
>       MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
>       if (--sh->refcnt)
>               goto exit;
> -     /* Release created Memory Regions. */
> -     mlx5_mr_release(sh);
>       /* Remove from memory callback device list. */
>       rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);
>       LIST_REMOVE(sh, mem_event_cb);
>       rte_rwlock_write_unlock(&mlx5_shared_data-
> >mem_event_rwlock);
> +     /* Release created Memory Regions. */
> +     mlx5_mr_release(sh);
>       /* Remove context from the global device list. */
>       LIST_REMOVE(sh, next);
>       /*
> --
> 1.8.3.1

Fixed typo in commit msg,

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

Reply via email to