Hi, Feifei

Thanks you for the patch.
Please, see my notes below about typos and minor commit message rewording.

> -----Original Message-----
> From: Feifei Wang <feifei.wa...@arm.com>
> Sent: Monday, May 17, 2021 13:00
> To: Matan Azrad <ma...@nvidia.com>; Shahaf Shuler
> <shah...@nvidia.com>; Slava Ovsiienko <viachesl...@nvidia.com>
> Cc: dev@dpdk.org; n...@arm.com; Feifei Wang <feifei.wa...@arm.com>;
> Ruifeng Wang <ruifeng.w...@arm.com>
> Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory
> Region cache
> 
> 'dev_gen' is a variable to inform other cores to flush their local cache when
> global cache is rebuilt. It is unnecessary to add write memory barrier (wmb)
> before or after its updating for synchronization.
> 
Would it be better "to trigger all cores to flush their local caches once the 
global
MR cache has been rebuilt"  ? 

> This is due to MR cache's R/W lock can maintain synchronization between
> threads:
I would add empty line here.
> 1. dev_gen and global cache update ordering inside the lock protected
> section does not matter. Because other threads cannot take the lock until
> global cache has been updated. Thus, in out of order platform, even if other
> agents firstly observed updated dev_gen but global does not update, they
> also needs to wait the lock. As a result, it is unnecessary to add a wmb
Type: "need" (no S) -> "have to" would be better ? 

> between rebuiling global cache and updating dev_gen to keep the order

rebuiling -> rebuilDing
And let's reword a little bit?
"wmb between global cache rebuilding and updating the dev_gen to keep the 
memory store order."

> rebuilding global cache and updating dev_gen.
> 
> 2. Store-Release of unlock can provide the implicit wmb at the level visible 
> by
can provide -> provides

> software. This makes 'rebuiling global cache' and 'updating dev_gen' be
Typo: rebuiling -> rebuilDing


> observed before local_cache starts to be updated by other agents. Thus,
> wmb after 'updating dev_gen' can be removed.
> 
> Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com>
> Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> ---
>  drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index
> e791b6338d..85e5865050 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct
> mlx5_dev_ctx_shared *sh,
>       if (rebuild) {
>               mlx5_mr_rebuild_cache(&sh->share_cache);
>               /*
> -              * Flush local caches by propagating invalidation across cores.
> -              * rte_smp_wmb() is enough to synchronize this event. If
> one of
> -              * freed memsegs is seen by other core, that means the
> memseg
> -              * has been allocated by allocator, which will come after this
> -              * free call. Therefore, this store instruction (incrementing
> -              * generation below) will be guaranteed to be seen by other
> core
> -              * before the core sees the newly allocated memory.
> +              * No wmb is needed after updating dev_gen due to store-
> release of
> +              * unlock can provide the implicit wmb at the level visible by
> +              * software. This makes rebuilt global cache and updated
> dev_gen
> +              * be observed when local_cache starts to be updating by
> other
> +              * agents.
>                */
Let's make comment a less wordy (and try to keep source code concise), what 
about this?
"No explicit wmb is needed after updating dev_gen due to store-release ordering
in unlock that provides the implicit barrier at the software visible level."

>               ++sh->share_cache.dev_gen;
>               DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
>                     sh->share_cache.dev_gen);
> -             rte_smp_wmb();
>       }
>       rte_rwlock_write_unlock(&sh->share_cache.rwlock);
>  }
> @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev,
> void *addr,
>             (void *)mr);
>       mlx5_mr_rebuild_cache(&sh->share_cache);
>       /*
> -      * Flush local caches by propagating invalidation across cores.
> -      * rte_smp_wmb() is enough to synchronize this event. If one of
> -      * freed memsegs is seen by other core, that means the memseg
> -      * has been allocated by allocator, which will come after this
> -      * free call. Therefore, this store instruction (incrementing
> -      * generation below) will be guaranteed to be seen by other core
> -      * before the core sees the newly allocated memory.
> +      * No wmb is needed after updating dev_gen due to store-release of
> +      * unlock can provide the implicit wmb at the level visible by
> +      * software. This makes rebuilt global cache and updated dev_gen
> +      * be observed when local_cache starts to be updating by other
> +      * agents.
The same as previous comment above.

Please, apply the same comments to the mlx4 patch:
http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-feifei.wa...@arm.com/

With best regards,
Slava

Reply via email to