Hi, Feifei Thanks you for the patch. Please, see my notes below about typos and minor commit message rewording.
> -----Original Message----- > From: Feifei Wang <feifei.wa...@arm.com> > Sent: Monday, May 17, 2021 13:00 > To: Matan Azrad <ma...@nvidia.com>; Shahaf Shuler > <shah...@nvidia.com>; Slava Ovsiienko <viachesl...@nvidia.com> > Cc: dev@dpdk.org; n...@arm.com; Feifei Wang <feifei.wa...@arm.com>; > Ruifeng Wang <ruifeng.w...@arm.com> > Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory > Region cache > > 'dev_gen' is a variable to inform other cores to flush their local cache when > global cache is rebuilt. It is unnecessary to add write memory barrier (wmb) > before or after its updating for synchronization. > Would it be better "to trigger all cores to flush their local caches once the global MR cache has been rebuilt" ? > This is due to MR cache's R/W lock can maintain synchronization between > threads: I would add empty line here. > 1. dev_gen and global cache update ordering inside the lock protected > section does not matter. Because other threads cannot take the lock until > global cache has been updated. Thus, in out of order platform, even if other > agents firstly observed updated dev_gen but global does not update, they > also needs to wait the lock. As a result, it is unnecessary to add a wmb Type: "need" (no S) -> "have to" would be better ? > between rebuiling global cache and updating dev_gen to keep the order rebuiling -> rebuilDing And let's reword a little bit? "wmb between global cache rebuilding and updating the dev_gen to keep the memory store order." > rebuilding global cache and updating dev_gen. > > 2. Store-Release of unlock can provide the implicit wmb at the level visible > by can provide -> provides > software. This makes 'rebuiling global cache' and 'updating dev_gen' be Typo: rebuiling -> rebuilDing > observed before local_cache starts to be updated by other agents. Thus, > wmb after 'updating dev_gen' can be removed. > > Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com> > Signed-off-by: Feifei Wang <feifei.wa...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > --- > drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++---------------- > 1 file changed, 10 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index > e791b6338d..85e5865050 100644 > --- a/drivers/net/mlx5/mlx5_mr.c > +++ b/drivers/net/mlx5/mlx5_mr.c > @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct > mlx5_dev_ctx_shared *sh, > if (rebuild) { > mlx5_mr_rebuild_cache(&sh->share_cache); > /* > - * Flush local caches by propagating invalidation across cores. > - * rte_smp_wmb() is enough to synchronize this event. If > one of > - * freed memsegs is seen by other core, that means the > memseg > - * has been allocated by allocator, which will come after this > - * free call. Therefore, this store instruction (incrementing > - * generation below) will be guaranteed to be seen by other > core > - * before the core sees the newly allocated memory. > + * No wmb is needed after updating dev_gen due to store- > release of > + * unlock can provide the implicit wmb at the level visible by > + * software. This makes rebuilt global cache and updated > dev_gen > + * be observed when local_cache starts to be updating by > other > + * agents. > */ Let's make comment a less wordy (and try to keep source code concise), what about this? "No explicit wmb is needed after updating dev_gen due to store-release ordering in unlock that provides the implicit barrier at the software visible level." > ++sh->share_cache.dev_gen; > DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d", > sh->share_cache.dev_gen); > - rte_smp_wmb(); > } > rte_rwlock_write_unlock(&sh->share_cache.rwlock); > } > @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, > void *addr, > (void *)mr); > mlx5_mr_rebuild_cache(&sh->share_cache); > /* > - * Flush local caches by propagating invalidation across cores. > - * rte_smp_wmb() is enough to synchronize this event. If one of > - * freed memsegs is seen by other core, that means the memseg > - * has been allocated by allocator, which will come after this > - * free call. Therefore, this store instruction (incrementing > - * generation below) will be guaranteed to be seen by other core > - * before the core sees the newly allocated memory. > + * No wmb is needed after updating dev_gen due to store-release of > + * unlock can provide the implicit wmb at the level visible by > + * software. This makes rebuilt global cache and updated dev_gen > + * be observed when local_cache starts to be updating by other > + * agents. The same as previous comment above. Please, apply the same comments to the mlx4 patch: http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2-feifei.wa...@arm.com/ With best regards, Slava