Hi, Slava > -----邮件原件----- > 发件人: Slava Ovsiienko <viachesl...@nvidia.com> > 发送时间: 2021年5月17日 22:15 > 收件人: Feifei Wang <feifei.wa...@arm.com>; Matan Azrad > <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com> > 主题: RE: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory > Region cache > > Hi, Feifei > > Thanks you for the patch. > Please, see my notes below about typos and minor commit message > rewording.
Thanks very much for your very careful reviewing. I will apply these in the next version. > > > -----Original Message----- > > From: Feifei Wang <feifei.wa...@arm.com> > > Sent: Monday, May 17, 2021 13:00 > > To: Matan Azrad <ma...@nvidia.com>; Shahaf Shuler > > <shah...@nvidia.com>; Slava Ovsiienko <viachesl...@nvidia.com> > > Cc: dev@dpdk.org; n...@arm.com; Feifei Wang <feifei.wa...@arm.com>; > > Ruifeng Wang <ruifeng.w...@arm.com> > > Subject: [PATCH v2 2/2] net/mlx5: remove unnecessary wmb for Memory > > Region cache > > > > 'dev_gen' is a variable to inform other cores to flush their local > > cache when global cache is rebuilt. It is unnecessary to add write > > memory barrier (wmb) before or after its updating for synchronization. > > > Would it be better "to trigger all cores to flush their local caches once the > global MR cache has been rebuilt" ? > 1.Yes, I think this can be more clear. > > This is due to MR cache's R/W lock can maintain synchronization > > between > > threads: > I would add empty line here. 2.Done. > > 1. dev_gen and global cache update ordering inside the lock protected > > section does not matter. Because other threads cannot take the lock > > until global cache has been updated. Thus, in out of order platform, > > even if other agents firstly observed updated dev_gen but global does > > not update, they also needs to wait the lock. As a result, it is > > unnecessary to add a wmb > Type: "need" (no S) -> "have to" would be better ? > 3.Done. > > between rebuiling global cache and updating dev_gen to keep the order > > rebuiling -> rebuilding 4.Done. > And let's reword a little bit? > "wmb between global cache rebuilding and updating the dev_gen to keep > the memory store order." > 5.Done. > > rebuilding global cache and updating dev_gen. > > > > 2. Store-Release of unlock can provide the implicit wmb at the level > > visible by > can provide -> provides > 6.Done. > > software. This makes 'rebuiling global cache' and 'updating dev_gen' > > be > Typo: rebuiling -> rebuilding 7.Done. > > > > observed before local_cache starts to be updated by other agents. > > Thus, wmb after 'updating dev_gen' can be removed. > > > > Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com> > > Signed-off-by: Feifei Wang <feifei.wa...@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > --- > > drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++---------------- > > 1 file changed, 10 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c > > index > > e791b6338d..85e5865050 100644 > > --- a/drivers/net/mlx5/mlx5_mr.c > > +++ b/drivers/net/mlx5/mlx5_mr.c > > @@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct > > mlx5_dev_ctx_shared *sh, > > if (rebuild) { > > mlx5_mr_rebuild_cache(&sh->share_cache); > > /* > > - * Flush local caches by propagating invalidation across cores. > > - * rte_smp_wmb() is enough to synchronize this event. If > > one of > > - * freed memsegs is seen by other core, that means the > > memseg > > - * has been allocated by allocator, which will come after this > > - * free call. Therefore, this store instruction (incrementing > > - * generation below) will be guaranteed to be seen by other > > core > > - * before the core sees the newly allocated memory. > > + * No wmb is needed after updating dev_gen due to store- > > release of > > + * unlock can provide the implicit wmb at the level visible by > > + * software. This makes rebuilt global cache and updated > > dev_gen > > + * be observed when local_cache starts to be updating by > > other > > + * agents. > > */ > Let's make comment a less wordy (and try to keep source code concise), > what about this? > "No explicit wmb is needed after updating dev_gen due to store-release > ordering in unlock that provides the implicit barrier at the software visible > level." 8.That's better than before. A concise comment works better in the code. > > > ++sh->share_cache.dev_gen; > > DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d", > > sh->share_cache.dev_gen); > > - rte_smp_wmb(); > > } > > rte_rwlock_write_unlock(&sh->share_cache.rwlock); > > } > > @@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, > void > > *addr, > > (void *)mr); > > mlx5_mr_rebuild_cache(&sh->share_cache); > > /* > > - * Flush local caches by propagating invalidation across cores. > > - * rte_smp_wmb() is enough to synchronize this event. If one of > > - * freed memsegs is seen by other core, that means the memseg > > - * has been allocated by allocator, which will come after this > > - * free call. Therefore, this store instruction (incrementing > > - * generation below) will be guaranteed to be seen by other core > > - * before the core sees the newly allocated memory. > > + * No wmb is needed after updating dev_gen due to store-release of > > + * unlock can provide the implicit wmb at the level visible by > > + * software. This makes rebuilt global cache and updated dev_gen > > + * be observed when local_cache starts to be updating by other > > + * agents. > The same as previous comment above. 9.Done. > > Please, apply the same comments to the mlx4 patch: > http://patches.dpdk.org/project/dpdk/patch/20210517100002.19905-2- > feifei.wa...@arm.com/ > 10.Done. Best Regards Feifei > With best regards, > Slava