'dev_gen' is a variable to inform other cores to flush their local cache
when global cache is rebuilt. It is unnecessary to add write memory
barrier (wmb) before or after its updating for synchronization.

This is due to MR cache's R/W lock can maintain synchronization between
threads:
1. dev_gen and global cache update ordering inside the lock protected
section does not matter. Because other threads cannot take the lock
until global cache has been updated. Thus, in out of order platform,
even if other agents firstly observed updated dev_gen but global does
not update, they also needs to wait the lock. As a result, it is
unnecessary to add a wmb between rebuiling global cache and updating
dev_gen to keep the order of rebuilding global cache and updating
dev_gen.

2. Store-Release of unlock can provide the implicit wmb at the level
visible by software. This makes 'rebuiling global cache' and 'updating
dev_gen' be observed before local_cache starts to be updated by other
agents. Thus, wmb after 'updating dev_gen' can be removed.

Suggested-by: Ruifeng Wang <ruifeng.w...@arm.com>
Signed-off-by: Feifei Wang <feifei.wa...@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
---
 drivers/net/mlx5/mlx5_mr.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index e791b6338d..85e5865050 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -107,18 +107,15 @@ mlx5_mr_mem_event_free_cb(struct mlx5_dev_ctx_shared *sh,
        if (rebuild) {
                mlx5_mr_rebuild_cache(&sh->share_cache);
                /*
-                * Flush local caches by propagating invalidation across cores.
-                * rte_smp_wmb() is enough to synchronize this event. If one of
-                * freed memsegs is seen by other core, that means the memseg
-                * has been allocated by allocator, which will come after this
-                * free call. Therefore, this store instruction (incrementing
-                * generation below) will be guaranteed to be seen by other core
-                * before the core sees the newly allocated memory.
+                * No wmb is needed after updating dev_gen due to store-release 
of
+                * unlock can provide the implicit wmb at the level visible by
+                * software. This makes rebuilt global cache and updated dev_gen
+                * be observed when local_cache starts to be updating by other
+                * agents.
                 */
                ++sh->share_cache.dev_gen;
                DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
                      sh->share_cache.dev_gen);
-               rte_smp_wmb();
        }
        rte_rwlock_write_unlock(&sh->share_cache.rwlock);
 }
@@ -411,18 +408,15 @@ mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr,
              (void *)mr);
        mlx5_mr_rebuild_cache(&sh->share_cache);
        /*
-        * Flush local caches by propagating invalidation across cores.
-        * rte_smp_wmb() is enough to synchronize this event. If one of
-        * freed memsegs is seen by other core, that means the memseg
-        * has been allocated by allocator, which will come after this
-        * free call. Therefore, this store instruction (incrementing
-        * generation below) will be guaranteed to be seen by other core
-        * before the core sees the newly allocated memory.
+        * No wmb is needed after updating dev_gen due to store-release of
+        * unlock can provide the implicit wmb at the level visible by
+        * software. This makes rebuilt global cache and updated dev_gen
+        * be observed when local_cache starts to be updating by other
+        * agents.
         */
        ++sh->share_cache.dev_gen;
        DRV_LOG(DEBUG, "broadcasting local cache flush, gen=%d",
              sh->share_cache.dev_gen);
-       rte_smp_wmb();
        rte_rwlock_read_unlock(&sh->share_cache.rwlock);
        return 0;
 }
-- 
2.25.1

Reply via email to