The "tx_db_nc" devarg forces doorbell register mapping to non-cached
region eliminating the extra write memory barrier. This argument was
used in creating the UAR for Tx and thus affected its performance.

Recently [1] its use has been extended to all UAR creation in all mlx5
drivers, and now its name is no longer so accurate.

This patch changes its name to "sq_db_nc" to suit any send queue that
uses it. The old name will still work for backward compatibility.

[1] commit 5dfa003db53f ("common/mlx5: fix post doorbell barrier")

Signed-off-by: Michael Baum <michae...@nvidia.com>
Reviewed-by: Raslan Darawsheh <rasl...@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viachesl...@nvidia.com>
---
 doc/guides/nics/mlx5.rst                   | 34 ++-----------------
 doc/guides/platform/mlx5.rst               | 39 ++++++++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.c |  2 +-
 drivers/common/mlx5/mlx5_common.c          | 31 ++++++++++++-----
 drivers/common/mlx5/mlx5_common_defs.h     |  8 ++---
 drivers/net/mlx5/linux/mlx5_verbs.c        |  2 +-
 drivers/net/mlx5/mlx5_devx.c               |  2 +-
 7 files changed, 72 insertions(+), 46 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f94ed90ef0..8956cd1dd8 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -814,37 +814,9 @@ for an additional list of options shared with other mlx5 
drivers.
 
 - ``tx_db_nc`` parameter [int]
 
-  The rdma core library can map doorbell register in two ways, depending on the
-  environment variable "MLX5_SHUT_UP_BF":
-
-  - As regular cached memory (usually with write combining attribute), if the
-    variable is either missing or set to zero.
-  - As non-cached memory, if the variable is present and set to not "0" value.
-
-  The type of mapping may slightly affect the Tx performance, the optimal 
choice
-  is strongly relied on the host architecture and should be deduced 
practically.
-
-  If ``tx_db_nc`` is set to zero, the doorbell is forced to be mapped to 
regular
-  memory (with write combining), the PMD will perform the extra write memory 
barrier
-  after writing to doorbell, it might increase the needed CPU clocks per packet
-  to send, but latency might be improved.
-
-  If ``tx_db_nc`` is set to one, the doorbell is forced to be mapped to non
-  cached memory, the PMD will not perform the extra write memory barrier
-  after writing to doorbell, on some architectures it might improve the
-  performance.
-
-  If ``tx_db_nc`` is set to two, the doorbell is forced to be mapped to regular
-  memory, the PMD will use heuristics to decide whether write memory barrier
-  should be performed. For bursts with size multiple of recommended one (64 
pkts)
-  it is supposed the next burst is coming and no need to issue the extra memory
-  barrier (it is supposed to be issued in the next coming burst, at least after
-  descriptor writing). It might increase latency (on some hosts till next
-  packets transmit) and should be used with care.
-
-  If ``tx_db_nc`` is omitted or set to zero, the preset (if any) environment
-  variable "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF",
-  the default ``tx_db_nc`` value is zero for ARM64 hosts and one for others.
+  This parameter name is deprecated and ignored.
+  The new name for this parameter is ``sq_db_nc``.
+  See :ref:`common driver options <mlx5_common_driver_options>`.
 
 - ``tx_pp`` parameter [int]
 
diff --git a/doc/guides/platform/mlx5.rst b/doc/guides/platform/mlx5.rst
index 0fd5e6604d..d073c213ca 100644
--- a/doc/guides/platform/mlx5.rst
+++ b/doc/guides/platform/mlx5.rst
@@ -600,3 +600,42 @@ and below are the arguments supported by the common mlx5 
layer.
   from system by default, without explicit rte memory flag.
 
   By default, the PMD will set this value to 0.
+
+- ``sq_db_nc`` parameter [int]
+
+  The rdma core library can map doorbell register in two ways,
+  depending on the environment variable "MLX5_SHUT_UP_BF":
+
+  - As regular cached memory (usually with write combining attribute),
+    if the variable is either missing or set to zero.
+  - As non-cached memory, if the variable is present and set to not "0" value.
+
+   The same doorbell mapping approach is implemented directly by PMD
+   in UAR generation for queues created with DevX.
+
+  The type of mapping may slightly affect the send queue performance,
+  the optimal choice strongly relied on the host architecture
+  and should be deduced practically.
+
+  If ``sq_db_nc`` is set to zero, the doorbell is forced to be mapped to
+  regular memory (with write combining), the PMD will perform the extra write
+  memory barrier after writing to doorbell, it might increase the needed CPU
+  clocks per packet to send, but latency might be improved.
+
+  If ``sq_db_nc`` is set to one, the doorbell is forced to be mapped to non
+  cached memory, the PMD will not perform the extra write memory barrier after
+  writing to doorbell, on some architectures it might improve the performance.
+
+  If ``sq_db_nc`` is set to two, the doorbell is forced to be mapped to
+  regular memory, the PMD will use heuristics to decide whether a write memory
+  barrier should be performed. For bursts with size multiple of recommended one
+  (64 pkts) it is supposed the next burst is coming and no need to issue the
+  extra memory barrier (it is supposed to be issued in the next coming burst,
+  at least after descriptor writing). It might increase latency (on some hosts
+  till the next packets transmit) and should be used with care.
+  The PMD uses heuristics only for Tx queue, for other semd queues the doorbell
+  is forced to be mapped to regular memory as same as ``sq_db_nc`` is set to 0.
+
+  If ``sq_db_nc`` is omitted, the preset (if any) environment variable
+  "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF", the
+  default ``sq_db_nc`` value is zero for ARM64 hosts and one for others.
\ No newline at end of file
diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c 
b/drivers/common/mlx5/linux/mlx5_common_os.c
index 0d3e24e04e..a752d79e8e 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -630,7 +630,7 @@ mlx5_config_doorbell_mapping_env(int dbnc)
                setenv(MLX5_SHUT_UP_BF, MLX5_SHUT_UP_BF_DEFAULT, 1);
        else
                setenv(MLX5_SHUT_UP_BF,
-                      dbnc == MLX5_TXDB_NCACHED ? "1" : "0", 1);
+                      dbnc == MLX5_SQ_DB_NCACHED ? "1" : "0", 1);
        return value;
 }
 
diff --git a/drivers/common/mlx5/mlx5_common.c 
b/drivers/common/mlx5/mlx5_common.c
index 96906d3f39..8cf391df13 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -35,10 +35,17 @@ uint8_t haswell_broadwell_cpu;
 
 /*
  * Device parameter to force doorbell register mapping
- * to non-cahed region eliminating the extra write memory barrier.
+ * to non-cached region eliminating the extra write memory barrier.
+ * Deprecated, ignored (Name changed to sq_db_nc).
  */
 #define MLX5_TX_DB_NC "tx_db_nc"
 
+/*
+ * Device parameter to force doorbell register mapping
+ * to non-cached region eliminating the extra write memory barrier.
+ */
+#define MLX5_SQ_DB_NC "sq_db_nc"
+
 /* In case this is an x86_64 intel processor to check if
  * we should use relaxed ordering.
  */
@@ -255,11 +262,17 @@ mlx5_common_args_check_handler(const char *key, const 
char *val, void *opaque)
                DRV_LOG(WARNING, "%s: \"%s\" is an invalid integer.", key, val);
                return -rte_errno;
        }
-       if (strcmp(key, MLX5_TX_DB_NC) == 0) {
-               if (tmp != MLX5_TXDB_CACHED &&
-                   tmp != MLX5_TXDB_NCACHED &&
-                   tmp != MLX5_TXDB_HEURISTIC) {
-                       DRV_LOG(ERR, "Invalid Tx doorbell mapping parameter.");
+       if (strcmp(key, MLX5_TX_DB_NC) == 0)
+               DRV_LOG(WARNING,
+                       "%s: deprecated parameter, converted to queue_db_nc",
+                       key);
+       if (strcmp(key, MLX5_SQ_DB_NC) == 0 ||
+           strcmp(key, MLX5_TX_DB_NC) == 0) {
+               if (tmp != MLX5_SQ_DB_CACHED &&
+                   tmp != MLX5_SQ_DB_NCACHED &&
+                   tmp != MLX5_SQ_DB_HEURISTIC) {
+                       DRV_LOG(ERR,
+                               "Invalid Send Queue doorbell mapping 
parameter.");
                        rte_errno = EINVAL;
                        return -rte_errno;
                }
@@ -293,6 +306,7 @@ mlx5_common_config_get(struct mlx5_kvargs_ctrl *mkvlist,
                RTE_DEVARGS_KEY_CLASS,
                MLX5_DRIVER_KEY,
                MLX5_TX_DB_NC,
+               MLX5_SQ_DB_NC,
                MLX5_MR_EXT_MEMSEG_EN,
                MLX5_SYS_MEM_EN,
                MLX5_MR_MEMPOOL_REG_EN,
@@ -317,7 +331,8 @@ mlx5_common_config_get(struct mlx5_kvargs_ctrl *mkvlist,
        DRV_LOG(DEBUG, "mr_ext_memseg_en is %u.", config->mr_ext_memseg_en);
        DRV_LOG(DEBUG, "mr_mempool_reg_en is %u.", config->mr_mempool_reg_en);
        DRV_LOG(DEBUG, "sys_mem_en is %u.", config->sys_mem_en);
-       DRV_LOG(DEBUG, "Tx doorbell mapping parameter is %d.", config->dbnc);
+       DRV_LOG(DEBUG, "Send Queue doorbell mapping parameter is %d.",
+               config->dbnc);
        return ret;
 }
 
@@ -1231,7 +1246,7 @@ mlx5_devx_alloc_uar(struct mlx5_common_device *cdev)
        for (retry = 0; retry < MLX5_ALLOC_UAR_RETRY; ++retry) {
 #ifdef MLX5DV_UAR_ALLOC_TYPE_NC
                /* Control the mapping type according to the settings. */
-               uar_mapping = (cdev->config.dbnc == MLX5_TXDB_NCACHED) ?
+               uar_mapping = (cdev->config.dbnc == MLX5_SQ_DB_NCACHED) ?
                            MLX5DV_UAR_ALLOC_TYPE_NC : MLX5DV_UAR_ALLOC_TYPE_BF;
 #else
                /*
diff --git a/drivers/common/mlx5/mlx5_common_defs.h 
b/drivers/common/mlx5/mlx5_common_defs.h
index ca80cd8d29..68b700dc0b 100644
--- a/drivers/common/mlx5/mlx5_common_defs.h
+++ b/drivers/common/mlx5/mlx5_common_defs.h
@@ -34,10 +34,10 @@
 /* Default PMD specific parameter value. */
 #define MLX5_ARG_UNSET (-1)
 
-/* MLX5_TX_DB_NC supported values. */
-#define MLX5_TXDB_CACHED 0
-#define MLX5_TXDB_NCACHED 1
-#define MLX5_TXDB_HEURISTIC 2
+/* MLX5_SQ_DB_NC supported values. */
+#define MLX5_SQ_DB_CACHED 0
+#define MLX5_SQ_DB_NCACHED 1
+#define MLX5_SQ_DB_HEURISTIC 2
 
 /* Fields of memory mapping type in offset parameter of mmap() */
 #define MLX5_UAR_MMAP_CMD_SHIFT 8
diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c 
b/drivers/net/mlx5/linux/mlx5_verbs.c
index 331c61d3c5..b6ba21c216 100644
--- a/drivers/net/mlx5/linux/mlx5_verbs.c
+++ b/drivers/net/mlx5/linux/mlx5_verbs.c
@@ -926,7 +926,7 @@ mlx5_txq_ibv_uar_init(struct mlx5_txq_ctrl *txq_ctrl, void 
*bf_reg)
                DRV_LOG(ERR, "Failed to get mem page size");
                rte_errno = ENOMEM;
        }
-       txq->db_heu = priv->sh->cdev->config.dbnc == MLX5_TXDB_HEURISTIC;
+       txq->db_heu = priv->sh->cdev->config.dbnc == MLX5_SQ_DB_HEURISTIC;
        txq->db_nc = mlx5_db_map_type_get(uar_mmap_offset, page_size);
        ppriv->uar_table[txq->idx].db = bf_reg;
 #ifndef RTE_ARCH_64
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index e178b799fa..a9b8c2a1b7 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -1326,7 +1326,7 @@ mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t 
idx)
        txq_data->qp_db = &txq_obj->sq_obj.db_rec[MLX5_SND_DBR];
        *txq_data->qp_db = 0;
        txq_data->qp_num_8s = txq_obj->sq_obj.sq->id << 8;
-       txq_data->db_heu = sh->cdev->config.dbnc == MLX5_TXDB_HEURISTIC;
+       txq_data->db_heu = sh->cdev->config.dbnc == MLX5_SQ_DB_HEURISTIC;
        txq_data->db_nc = sh->tx_uar.dbnc;
        txq_data->wait_on_time = !!(!sh->config.tx_pp &&
                                    sh->cdev->config.hca_attr.wait_on_time);
-- 
2.25.1

Reply via email to