[dpdk-dev] [PATCH] ethdev: check if queue setupped in queue-related APIs

2020-10-10 Thread Wei Hu (Xavier)
From: Chengchang Tang 

This patch adds checking whether the related Tx or Rx queue has been
setupped in the queue-related API functions to avoid illegal address
access. And validity check of the queue_id is also added in the API
functions rte_eth_dev_rx_intr_enable and rte_eth_dev_rx_intr_disable.

Signed-off-by: Chengchang Tang 
Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Chengwen Feng 
---
 lib/librte_ethdev/rte_ethdev.c | 56 ++
 lib/librte_ethdev/rte_ethdev.h |  3 ++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 892c246..31a8eb3 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -897,6 +897,13 @@ rte_eth_dev_rx_queue_start(uint16_t port_id, uint16_t 
rx_queue_id)
return -EINVAL;
}
 
+   if (dev->data->rx_queues[rx_queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR, "Rx queue %"PRIu16" of device with 
port_id=%"
+   PRIu16" has not been setupped\n",
+   rx_queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
 
if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
@@ -931,6 +938,13 @@ rte_eth_dev_rx_queue_stop(uint16_t port_id, uint16_t 
rx_queue_id)
return -EINVAL;
}
 
+   if (dev->data->rx_queues[rx_queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR, "Rx queue %"PRIu16" of device with 
port_id=%"
+   PRIu16" has not been setupped\n",
+   rx_queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
 
if (rte_eth_dev_is_rx_hairpin_queue(dev, rx_queue_id)) {
@@ -971,6 +985,13 @@ rte_eth_dev_tx_queue_start(uint16_t port_id, uint16_t 
tx_queue_id)
return -EINVAL;
}
 
+   if (dev->data->rx_queues[tx_queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR, "Tx queue %"PRIu16" of device with 
port_id=%"
+   PRIu16" has not been setupped\n",
+   tx_queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
 
if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
@@ -1003,6 +1024,13 @@ rte_eth_dev_tx_queue_stop(uint16_t port_id, uint16_t 
tx_queue_id)
return -EINVAL;
}
 
+   if (dev->data->rx_queues[tx_queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR, "Tx queue %"PRIu16" of device with 
port_id=%"
+   PRIu16" has not been setupped\n",
+   tx_queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
 
if (rte_eth_dev_is_tx_hairpin_queue(dev, tx_queue_id)) {
@@ -4463,6 +4491,20 @@ rte_eth_dev_rx_intr_enable(uint16_t port_id,
 
dev = &rte_eth_devices[port_id];
 
+   if (queue_id >= dev->data->nb_rx_queues) {
+   RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%"PRIu16"\n",
+  queue_id);
+   return -EINVAL;
+   }
+
+   if (dev->data->rx_queues[queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR,
+  "Rx queue %"PRIu16" of device with port_id=%"
+  PRIu16" has not been setupped\n",
+  queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
return eth_err(port_id, (*dev->dev_ops->rx_queue_intr_enable)(dev,
queue_id));
@@ -4478,6 +4520,20 @@ rte_eth_dev_rx_intr_disable(uint16_t port_id,
 
dev = &rte_eth_devices[port_id];
 
+   if (queue_id >= dev->data->nb_rx_queues) {
+   RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%"PRIu16"\n",
+  queue_id);
+   return -EINVAL;
+   }
+
+   if (dev->data->rx_queues[queue_id] == NULL) {
+   RTE_ETHDEV_LOG(ERR,
+  "Rx queue %"PRIu16" of device with port_id=%"
+  PRIu16" has not been setupped\n",
+  queue_id, port_id);
+   return -EINVAL;
+   }
+
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
return eth_err(port_id, (*dev->dev_ops->rx_queue_intr_disable)(dev,
queue_id));
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 5bcfbb8..f4cc591 100644
--- a/lib/librte_ethdev/rte_ethde

Re: [dpdk-dev] [PATCH v2] kernel: remove igb_uio

2020-10-10 Thread 谢华伟(此时此刻)



On 2020/10/5 17:11, Maxime Coquelin wrote:


On 10/5/20 10:57 AM, Thomas Monjalon wrote:

24/09/2020 07:41, Stephen Hemminger:

On Fri, 11 Sep 2020 17:54:48 +0200
Thomas Monjalon  wrote:


As decided in the Technical Board in November 2019,
the kernel module igb_uio is moved to the dpdk-kmods repository
in the /linux/igb_uio/ directory.

Minutes of Technical Board meeting:
https://mails.dpdk.org/archives/dev/2019-November/151763.html

Signed-off-by: Thomas Monjalon 
---
v2: update few docs (including release notes)

Good so far:
Acked-by: Stephen Hemminger 

You may want to address all the references to igb_uio in guides/nics

ark.rst
axgbe.rst
bnx2x.rst
bnxt.rst
build_and_test.rst
ena.rst
enic.rst
features.rst
hns3.rst
i40e.rst
intel_vf.rst
ixgbe.rst
liquidio.rst
mlx4.rst
mlx5.rst
nfp.rst
qede.rst
virtio.rst

igb_uio is still available.
A next step in deprecation might be to remove igb_uio references.


What about drivers like ark which don't mention vfio?

They should be updated by their maintainer.


Does virtio still require igb_uio? or x86 I/O port for doorbell?
Or is this just stale language.

Maxime, any update on the use of igb_uio with virtio?

For sure Virtio don't require igb_uio, I always use vfio myself.
It seems the doc needs an update, I'll try to look at it later in this
release.

Regards,
Maxime


PIO/MMIO write(notify backend) needs to go through vfio ioctl call,

which impacts performance.  I fix this in another patch. PIO/MMIO

read/write will be executed directly in user space whatever driver is being

used.


/huawei



[dpdk-dev] [PATCH v2 01/12] net/bnxt: fix the corruption of the session details

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

The session details that is shared among multiple ports
need to be outside the bnxt structure.

Fixes: 70e64b27af5b ("net/bnxt: support ULP session manager cleanup")
Cc: sta...@dpdk.org

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index 289619411..a4d48c71a 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -159,7 +159,9 @@ ulp_ctx_session_open(struct bnxt *bp,
}
if (!session->session_opened) {
session->session_opened = 1;
-   session->g_tfp = &bp->tfp;
+   session->g_tfp = rte_zmalloc("bnxt_ulp_session_tfp",
+sizeof(struct tf), 0);
+   session->g_tfp->session = bp->tfp.session;
}
return rc;
 }
@@ -176,6 +178,7 @@ ulp_ctx_session_close(struct bnxt *bp,
if (session->session_opened)
tf_close_session(&bp->tfp);
session->session_opened = 0;
+   rte_free(session->g_tfp);
session->g_tfp = NULL;
 }
 
-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 00/12] bnxt patches

2020-10-10 Thread Ajit Khaparde
Fixes and enchancements in the bnxt PMD, mostly in the
TRUFLOW layer, including templates to add support for
Stingray device.

v2:
- squashed patch patch 13 to patch 7
- updated and fixed commit logs
- updated docs and release notes where necessary

Kishore Padmanabha (4):
  net/bnxt: fix the corruption of the session details
  net/bnxt: combine default and regular flows
  net/bnxt: add support for parent child flow database
  net/bnxt: add parent child flow create and free

Mike Baucom (6):
  net/bnxt: add multi-device infrastructure
  net/bnxt: add Stingray device support to ULP
  net/bnxt: consolidate template table processing
  net/bnxt: support runtime EM selection
  net/bnxt: consolidate template table processing
  net/bnxt: remove flow db table type from templates

Venkat Duvvuru (2):
  net/bnxt: fix PMD PF support in SR-IOV mode
  net/bnxt: handle default vnic change async event

 doc/guides/nics/bnxt.rst  |42 +
 doc/guides/rel_notes/release_20_11.rst| 1 +
 drivers/net/bnxt/bnxt.h   | 6 +-
 drivers/net/bnxt/bnxt_cpr.c   |13 +-
 drivers/net/bnxt/bnxt_ethdev.c|40 +-
 drivers/net/bnxt/bnxt_hwrm.c  |   463 +-
 drivers/net/bnxt/bnxt_hwrm.h  |12 +-
 drivers/net/bnxt/meson.build  | 4 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c|   387 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h|11 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   | 5 +-
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c   | 5 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c  | 2 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c |   892 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |   179 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  |   520 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |22 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_act.c |  1810 --
 .../net/bnxt/tf_ulp/ulp_template_db_class.c   | 16271 -
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|18 +-
 .../tf_ulp/ulp_template_db_stingray_act.c |  3305 +++
 .../tf_ulp/ulp_template_db_stingray_class.c   | 19005 
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |59 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h |48 +
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c |  3304 +++
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 19005 
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |64 +-
 drivers/net/bnxt/tf_ulp/ulp_utils.h   | 4 +
 28 files changed, 46530 insertions(+), 18967 deletions(-)
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_wh_plus_act.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_wh_plus_class.c

-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 04/12] net/bnxt: fix PMD PF support in SR-IOV mode

2020-10-10 Thread Ajit Khaparde
From: Venkat Duvvuru 

1. Implement HWRM_FUNC_VF_RESOURCE_CFG command and use it to
   reserve resources for VFs when NEW RM is enabled.
2. Invoke PF’s FUNC_CFG before configuring VFs resources.
3. Don’t consider max_rx_em_flows in max_l2_ctx calculation
   when VFs are configured.
4. Issue HWRM_FUNC_QCFG instead of HWRM_FUNC_QCAPS to find
   out the actual allocated resources for VF.
5. Don’t add random mac to the VF.
6. Handle completion type CMPL_BASE_TYPE_HWRM_FWD_REQ instead
   of CMPL_BASE_TYPE_HWRM_FWD_RESP.
7. Don't enable HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_FWD_NONE_MODE
   when the list of HWRM commands that needs to be forwarded
   to the PF is specified in HWRM_FUNC_DRV_RGTR.
8. Update the HWRM commands list that can be forwared to the
   PF.

Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for 
PF/VF")
Cc: sta...@dpdk.org

Signed-off-by: Venkat Duvvuru 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|   6 +-
 drivers/net/bnxt/bnxt_cpr.c|   6 +-
 drivers/net/bnxt/bnxt_ethdev.c |  40 +--
 drivers/net/bnxt/bnxt_hwrm.c   | 461 -
 drivers/net/bnxt/bnxt_hwrm.h   |  12 +-
 5 files changed, 309 insertions(+), 216 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index eca74486e..a951bca7a 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -167,6 +167,9 @@
 #defineBNXT_DEFAULT_VNIC_CHANGE_VF_ID_SFT  \
HWRM_ASYNC_EVENT_CMPL_DEFAULT_VNIC_CHANGE_EVENT_DATA1_VF_ID_SFT
 
+#define BNXT_HWRM_CMD_TO_FORWARD(cmd)  \
+   (bp->pf->vf_req_fwd[(cmd) / 32] |= (1 << ((cmd) % 32)))
+
 struct bnxt_led_info {
uint8_t  num_leds;
uint8_t  led_id;
@@ -664,9 +667,10 @@ struct bnxt {
 #define BNXT_FW_CAP_IF_CHANGE  BIT(1)
 #define BNXT_FW_CAP_ERROR_RECOVERY BIT(2)
 #define BNXT_FW_CAP_ERR_RECOVER_RELOAD BIT(3)
+#define BNXT_FW_CAP_HCOMM_FW_STATUSBIT(4)
 #define BNXT_FW_CAP_ADV_FLOW_MGMT  BIT(5)
 #define BNXT_FW_CAP_ADV_FLOW_COUNTERS  BIT(6)
-#define BNXT_FW_CAP_HCOMM_FW_STATUSBIT(7)
+#define BNXT_FW_CAP_LINK_ADMIN BIT(7)
 
pthread_mutex_t flow_lock;
 
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index a3a7e6ab7..54923948f 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -239,7 +239,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
goto reject;
}
 
-   if (bnxt_rcv_msg_from_vf(bp, vf_id, fwd_cmd) == true) {
+   if (bnxt_rcv_msg_from_vf(bp, vf_id, fwd_cmd)) {
/*
 * In older firmware versions, the MAC had to be all zeros for
 * the VF to set it's MAC via hwrm_func_vf_cfg. Set to all
@@ -254,6 +254,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
(const uint8_t *)"\x00\x00\x00\x00\x00");
}
}
+
if (fwd_cmd->req_type == HWRM_CFA_L2_SET_RX_MASK) {
struct hwrm_cfa_l2_set_rx_mask_input *srm =
(void *)fwd_cmd;
@@ -265,6 +266,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
HWRM_CFA_L2_SET_RX_MASK_INPUT_MASK_VLAN_NONVLAN |
HWRM_CFA_L2_SET_RX_MASK_INPUT_MASK_ANYVLAN_NONVLAN);
}
+
/* Forward */
rc = bnxt_hwrm_exec_fwd_resp(bp, fw_vf_id, fwd_cmd, req_len);
if (rc) {
@@ -306,7 +308,7 @@ int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct 
cmpl_base *cmp)
bnxt_handle_async_event(bp, cmp);
evt = 1;
break;
-   case CMPL_BASE_TYPE_HWRM_FWD_RESP:
+   case CMPL_BASE_TYPE_HWRM_FWD_REQ:
/* Handle HWRM forwarded responses */
bnxt_handle_fwd_req(bp, cmp);
evt = 1;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 8b63134c3..b4654ec6a 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -5208,37 +5208,14 @@ static void bnxt_config_vf_req_fwd(struct bnxt *bp)
if (!BNXT_PF(bp))
return;
 
-#define ALLOW_FUNC(x)  \
-   { \
-   uint32_t arg = (x); \
-   bp->pf->vf_req_fwd[((arg) >> 5)] &= \
-   ~rte_cpu_to_le_32(1 << ((arg) & 0x1f)); \
-   }
-
-   /* Forward all requests if firmware is new enough */
-   if (((bp->fw_ver >= ((20 << 24) | (6 << 16) | (100 << 8))) &&
-(bp->fw_ver < ((20 << 24) | (7 << 16 ||
-   ((bp->fw_ver >= ((20 << 24) | (8 << 16) {
-   memset(bp->pf->vf_req_fwd, 0xff, sizeof(bp->pf->vf_req_fwd));
-   } else {
-   PMD_DRV_LOG(WARNING,
-   "Firmware too old for VF mailbox functional

[dpdk-dev] [PATCH v2 06/12] net/bnxt: combine default and regular flows

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

The default and regular flows are stored in the same flow table
instead of different flow tables. This should help code reuse
and reducing the number of allocations.
So combine default and regular flows in flow database.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c  |   2 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c|   2 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c   | 423 +++-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h   |  75 ++---
 drivers/net/bnxt/tf_ulp/ulp_mapper.c|  33 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h|  11 +-
 8 files changed, 259 insertions(+), 295 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index eeda2d033..9ed92a88d 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -853,7 +853,7 @@ bnxt_ulp_deinit(struct bnxt *bp,
bnxt_ulp_destroy_vfr_default_rules(bp, true);
 
/* clean up regular flows */
-   ulp_flow_db_flush_flows(bp->ulp_ctx, BNXT_ULP_REGULAR_FLOW_TABLE);
+   ulp_flow_db_flush_flows(bp->ulp_ctx, BNXT_ULP_FDB_TYPE_REGULAR);
 
/* cleanup the eem table scope */
ulp_eem_tbl_scope_deinit(bp, bp->ulp_ctx);
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
index eea39f6b7..c7b29824e 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
@@ -281,8 +281,8 @@ bnxt_ulp_flow_destroy(struct rte_eth_dev *dev,
return -EINVAL;
}
 
-   ret = ulp_mapper_flow_destroy(ulp_ctx, flow_id,
- BNXT_ULP_REGULAR_FLOW_TABLE);
+   ret = ulp_mapper_flow_destroy(ulp_ctx, BNXT_ULP_FDB_TYPE_REGULAR,
+ flow_id);
if (ret) {
BNXT_TF_DBG(ERR, "Failed to destroy flow.\n");
if (error)
diff --git a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c 
b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
index 01f4fd087..c36d4d4c4 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
@@ -391,8 +391,8 @@ ulp_default_flow_destroy(struct rte_eth_dev *eth_dev, 
uint32_t flow_id)
return rc;
}
 
-   rc = ulp_mapper_flow_destroy(ulp_ctx, flow_id,
-BNXT_ULP_DEFAULT_FLOW_TABLE);
+   rc = ulp_mapper_flow_destroy(ulp_ctx, BNXT_ULP_FDB_TYPE_DEFAULT,
+flow_id);
if (rc)
BNXT_TF_DBG(ERR, "Failed to destroy flow.\n");
 
diff --git a/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c 
b/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
index 5a0bf602a..051ebac04 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
@@ -561,7 +561,7 @@ int ulp_fc_mgr_query_count_get(struct bnxt_ulp_context 
*ctxt,
 
do {
rc = ulp_flow_db_resource_get(ctxt,
- BNXT_ULP_REGULAR_FLOW_TABLE,
+ BNXT_ULP_FDB_TYPE_REGULAR,
  flow_id,
  &nxt_resource_index,
  ¶ms);
diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index 9a2d3758d..0a3fb015c 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -27,49 +27,66 @@
 #define ULP_FLOW_DB_RES_NXT_RESET(dst) ((dst) &= ~(ULP_FLOW_DB_RES_NXT_MASK))
 
 /*
- * Helper function to set the bit in the active flow table
+ * Helper function to set the bit in the active flows
  * No validation is done in this function.
  *
- * flow_tbl [in] Ptr to flow table
+ * flow_db [in] Ptr to flow database
+ * flow_type [in] - specify default or regular
  * idx [in] The index to bit to be set or reset.
  * flag [in] 1 to set and 0 to reset.
  *
  * returns none
  */
 static void
-ulp_flow_db_active_flow_set(struct bnxt_ulp_flow_tbl   *flow_tbl,
-   uint32_tidx,
-   uint32_tflag)
+ulp_flow_db_active_flows_bit_set(struct bnxt_ulp_flow_db *flow_db,
+enum bnxt_ulp_fdb_type flow_type,
+uint32_t idx,
+uint32_t flag)
 {
-   uint32_tactive_index;
-
-   active_index = idx / ULP_INDEX_BITMAP_SIZE;
-   if (flag)
-   ULP_INDEX_BITMAP_SET(flow_tbl->active_flow_tbl[active_index],
-idx);
-   else
-   ULP_INDEX_BITMAP_RESET(flow_tbl->active_flow_tbl[active_index],
-  idx);
+  

[dpdk-dev] [PATCH v2 05/12] net/bnxt: consolidate template table processing

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

The table processing has been consolidated to be able to reuse the same
code for action and classification template processing.

Signed-off-by: Mike Baucom 
Reviewed-by: Kishore Padmanabha 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   |   1 +
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c   |   1 +
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 298 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |   4 +-
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|   6 +
 .../tf_ulp/ulp_template_db_stingray_class.c   |   2 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |  56 +++-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h |   4 +-
 .../tf_ulp/ulp_template_db_wh_plus_class.c|   2 +-
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  11 +-
 10 files changed, 144 insertions(+), 241 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
index 566e1254a..eea39f6b7 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
@@ -147,6 +147,7 @@ bnxt_ulp_flow_create(struct rte_eth_dev *dev,
mapper_cparms.act_prop = ¶ms.act_prop;
mapper_cparms.class_tid = class_id;
mapper_cparms.act_tid = act_tmpl;
+   mapper_cparms.flow_type = BNXT_ULP_FDB_TYPE_REGULAR;
 
/* Get the function id */
if (ulp_port_db_port_func_id_get(ulp_ctx,
diff --git a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c 
b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
index 8dea235f0..01f4fd087 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
@@ -351,6 +351,7 @@ ulp_default_flow_create(struct rte_eth_dev *eth_dev,
}
 
mapper_params.class_tid = ulp_class_tid;
+   mapper_params.flow_type = BNXT_ULP_FDB_TYPE_DEFAULT;
 
rc = ulp_mapper_flow_create(ulp_ctx, &mapper_params, flow_id);
if (rc) {
diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index 44a29629b..5ed481ab3 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -216,37 +216,6 @@ ulp_mapper_act_prop_size_get(uint32_t idx)
return ulp_act_prop_map_table[idx];
 }
 
-/*
- * Get the list of result fields that implement the flow action.
- * Gets a device dependent list of tables that implement the action template 
id.
- *
- * mparms [in] The mappers parms with data related to the flow.
- *
- * tid [in] The action template id that matches the flow
- *
- * num_tbls [out] The number of action tables in the returned array
- *
- * Returns An array of action tables to implement the flow, or NULL on error.
- */
-static struct bnxt_ulp_mapper_tbl_info *
-ulp_mapper_action_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
-  uint32_t tid,
-  uint32_t *num_tbls)
-{
-   uint32_tidx;
-   const struct ulp_template_device_tbls *dev_tbls;
-
-   dev_tbls = mparms->device_params->dev_tbls;
-
-   /* NOTE: Need to have something from template compiler to help validate
-* range of dev_id and act_tid
-*/
-   idx = dev_tbls->act_tmpl_list[tid].start_tbl_idx;
-   *num_tbls = dev_tbls->act_tmpl_list[tid].num_tbls;
-
-   return &dev_tbls->act_tbl_list[idx];
-}
-
 /*
  * Get a list of classifier tables that implement the flow
  * Gets a device dependent list of tables that implement the class template id
@@ -257,30 +226,23 @@ ulp_mapper_action_tbl_list_get(struct 
bnxt_ulp_mapper_parms *mparms,
  *
  * num_tbls [out] The number of classifier tables in the returned array
  *
- * fdb_tbl_idx [out] The flow database index Regular or default
- *
  * returns An array of classifier tables to implement the flow, or NULL on
  * error
  */
 static struct bnxt_ulp_mapper_tbl_info *
-ulp_mapper_class_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
- uint32_t tid,
- uint32_t *num_tbls,
- uint32_t *fdb_tbl_idx)
+ulp_mapper_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
+   uint32_t tid,
+   uint32_t *num_tbls)
 {
uint32_t idx;
const struct ulp_template_device_tbls *dev_tbls;
 
-   dev_tbls = mparms->device_params->dev_tbls;
+   dev_tbls = &mparms->device_params->dev_tbls[mparms->tmpl_type];
 
-   /* NOTE: Need to have something from template compiler to help validate
-* range of dev_id and tid
-*/
-   idx = dev_tbls->class_tmpl_list[tid].start_tbl_idx;
-   *num_tbls = dev_tbls->class_tmpl_list[tid].num_tbls;
-   *fdb_tbl_idx = dev_tbls->class_tmpl_list[tid].flow_db_table_type;
+   idx = dev_tbls->tmpl_list[tid].start_tbl_idx;
+   *num_tbls = dev_tbls->tmpl_list[tid].num_tbls;
 
-   return &dev_tbls->class_tbl_list[idx];
+   return &dev_tbls->tbl_list[idx];
 }
 
 /*
@@ -302,1

Re: [dpdk-dev] [dpdk-techboard] [PATCH V5 1/2] dpdk: resolve compiling errors for per-queue stats

2020-10-10 Thread Thomas Monjalon
09/10/2020 22:32, Ferruh Yigit:
> On 10/6/2020 9:33 AM, Olivier Matz wrote:
> > On Mon, Oct 05, 2020 at 01:23:08PM +0100, Ferruh Yigit wrote:
> >> On 9/28/2020 4:43 PM, Stephen Hemminger wrote:
> >>> On Mon, 28 Sep 2020 17:24:26 +0200
> >>> Thomas Monjalon  wrote:
>  28/09/2020 15:53, Ferruh Yigit:
> > On 9/28/2020 10:16 AM, Thomas Monjalon wrote:
> >> 28/09/2020 10:59, Ferruh Yigit:
> >>> On 9/27/2020 4:16 AM, Min Hu (Connor) wrote:
>  From: Huisong Li 
> 
>  Currently, only statistics of rx/tx queues with queue_id less than
>  RTE_ETHDEV_QUEUE_STAT_CNTRS can be displayed. If there is a certain
>  application scenario that it needs to use 256 or more than 256 queues
>  and display all statistics of rx/tx queue. At this moment, we have to
>  change the macro to be equaled to the queue number.
> 
>  However, modifying the macro to be greater than 256 will trigger
>  many errors and warnings from test-pmd, PMD drivers and librte_ethdev
>  during compiling dpdk project. But it is possible and permitted that
>  rx/tx queue number is greater than 256 and all statistics of rx/tx
>  queue need to be displayed. In addition, the data type of rx/tx queue
>  number in rte_eth_dev_configure API is 'uint16_t'. So It is 
>  unreasonable
>  to use the 'uint8_t' type for variables that control which per-queue
>  statistics can be displayed.
> >>
> >> The explanation is too much complex and misleading.
> >> You mean you cannot increase RTE_ETHDEV_QUEUE_STAT_CNTRS
> >> above 256 because it is an 8-bit type?
> >>
> >> [...]
>  --- a/lib/librte_ethdev/rte_ethdev.h
>  +++ b/lib/librte_ethdev/rte_ethdev.h
>   int rte_eth_dev_set_tx_queue_stats_mapping(uint16_t port_id,
>  -uint16_t tx_queue_id, uint8_t stat_idx);
>  +uint16_t tx_queue_id, uint16_t stat_idx);
> >> [...]
>   int rte_eth_dev_set_rx_queue_stats_mapping(uint16_t port_id,
>  uint16_t rx_queue_id,
>  -   uint8_t stat_idx);
>  +   uint16_t stat_idx);
> >> [...]
> >>> cc'ed tech-board,
> >>>
> >>> The patch breaks the ethdev ABI without a deprecation notice from 
> >>> previous
> >>> release(s).
> >>>
> >>> It is mainly a fix to the port_id storage type, which we have updated 
> >>> from
> >>> uint8_t to uint16_t in past but some seems remained for
> >>> 'rte_eth_dev_set_tx_queue_stats_mapping()' &
> >>> 'rte_eth_dev_set_rx_queue_stats_mapping()' APIs.
> >>
> >> No, it is not related to the port id, but the number of limited stats.
> >
> > Right, it is not related to the port id, it is fixing the storage type 
> > for index
> > used to map the queue stats.
> >>> Since the ethdev library already heavily breaks the ABI this release, 
> >>> I am for
> >>> getting this fix, instead of waiting the fix for one more year.
> >>
> >> If stats can be managed for more than 256 queues, I think it means
> >> it is not limited. In this case, we probably don't need the API
> >> *_queue_stats_mapping which was invented for a limitation of ixgbe.
> >>
> >> The problem is probably somewhere else (in testpmd),
> >> that's why I am against this patch.
> >
> > This patch is not to fix queue stats mapping, I agree there are 
> > problems related
> > to it, already shared as comment to this set.
> >
> > But this patch is to fix the build errors when 
> > 'RTE_ETHDEV_QUEUE_STAT_CNTRS'
> > needs to set more than 255. Where the build errors seems around the
> > stats_mapping APIs.
> 
>  It is not said this API is supposed to manage more than 256 queues 
>  mapping.
>  In general we should not need this API.
>  I think it is solving the wrong problem.
> >>>
> >>>
> >>> The original API is a band aid for the limited number of statistics 
> >>> counters
> >>> in the Intel IXGBE hardware. It crept into to the DPDK as an API. I would 
> >>> rather
> >>> have per-queue statistics and make ixgbe say "not supported"
> >>>
> >>
> >> The current issue is not directly related to '*_queue_stats_mapping' APIs.
> >>
> >> Problem is not able to set 'RTE_ETHDEV_QUEUE_STAT_CNTRS' > 255.
> >> User may need to set the 'RTE_ETHDEV_QUEUE_STAT_CNTRS' > 255, since it is
> >> used to define size of the stats counter.
> >> "uint64_t q_ipackets[RTE_ETHDEV_QUEUE_STAT_CNTRS];"
> >>
> >> When 'RTE_ETHDEV_QUEUE_STAT_CNTRS' > 255, it gives multiple build errors,
> >> the one in the ethdev is like [1].
> >>
> >> This can be fixed two ways,
> >> a) increase the size of 'stat_idx' storage type to u16 in the
> >> '*_queue_stats_mapping' APIs, this is what this patch do

[dpdk-dev] [PATCH v2 07/12] net/bnxt: handle default vnic change async event

2020-10-10 Thread Ajit Khaparde
From: Venkat Duvvuru 

Currently, we are only registering to this event if the function
is a trusted VF. This patch extends it for PFs as well.

Fixes: 322bd6e70272 ("net/bnxt: add port representor infrastructure")
Cc: sta...@dpdk.org

Signed-off-by: Venkat Duvvuru 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_cpr.c  | 7 ++-
 drivers/net/bnxt/bnxt_hwrm.c | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index 54923948f..91d1ffe46 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -50,7 +50,7 @@ static void
 bnxt_process_default_vnic_change(struct bnxt *bp,
 struct hwrm_async_event_cmpl *async_cmp)
 {
-   uint16_t fid, vnic_state, parent_id, vf_fid, vf_id;
+   uint16_t vnic_state, vf_fid, vf_id;
struct bnxt_representor *vf_rep_bp;
struct rte_eth_dev *eth_dev;
bool vfr_found = false;
@@ -67,10 +67,7 @@ bnxt_process_default_vnic_change(struct bnxt *bp,
if (vnic_state != BNXT_DEFAULT_VNIC_ALLOC)
return;
 
-   parent_id = (event_data & BNXT_DEFAULT_VNIC_CHANGE_PF_ID_MASK) >>
-   BNXT_DEFAULT_VNIC_CHANGE_PF_ID_SFT;
-   fid = BNXT_PF(bp) ? bp->fw_fid : bp->parent->fid;
-   if (parent_id != fid || !bp->rep_info)
+   if (!bp->rep_info)
return;
 
vf_fid = (event_data & BNXT_DEFAULT_VNIC_CHANGE_VF_ID_MASK) >>
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 8133afc74..eef282b69 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -938,7 +938,7 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp)
req.async_event_fwd[1] |=
rte_cpu_to_le_32(ASYNC_CMPL_EVENT_ID_DBG_NOTIFICATION);
 
-   if (BNXT_VF_IS_TRUSTED(bp))
+   if (BNXT_PF(bp) || BNXT_VF_IS_TRUSTED(bp))
req.async_event_fwd[1] |=
rte_cpu_to_le_32(ASYNC_CMPL_EVENT_ID_DEFAULT_VNIC_CHANGE);
 
-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 09/12] net/bnxt: add support for parent child flow database

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

Added support for parent child flow database apis. This
feature adds support to enable vxlan decap support where
flows needs to maintain parent-child flow relationship.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c | 348 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |  84 +
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |   1 +
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |   1 +
 drivers/net/bnxt/tf_ulp/ulp_utils.h   |   4 +
 5 files changed, 435 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index da012451d..a1c39329f 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -207,13 +207,16 @@ ulp_flow_db_alloc_resource(struct bnxt_ulp_flow_db 
*flow_db)
return -ENOMEM;
}
size = (flow_tbl->num_flows / sizeof(uint64_t)) + 1;
-   flow_tbl->active_reg_flows = rte_zmalloc("active reg flows", size, 0);
+   size =  ULP_BYTE_ROUND_OFF_8(size);
+   flow_tbl->active_reg_flows = rte_zmalloc("active reg flows", size,
+ULP_BUFFER_ALIGN_64_BYTE);
if (!flow_tbl->active_reg_flows) {
BNXT_TF_DBG(ERR, "Failed to alloc memory active reg flows\n");
return -ENOMEM;
}
 
-   flow_tbl->active_dflt_flows = rte_zmalloc("active dflt flows", size, 0);
+   flow_tbl->active_dflt_flows = rte_zmalloc("active dflt flows", size,
+ ULP_BUFFER_ALIGN_64_BYTE);
if (!flow_tbl->active_dflt_flows) {
BNXT_TF_DBG(ERR, "Failed to alloc memory active dflt flows\n");
return -ENOMEM;
@@ -284,6 +287,86 @@ ulp_flow_db_func_id_set(struct bnxt_ulp_flow_db *flow_db,
BNXT_TF_DBG(ERR, "Invalid flow id, flowdb corrupt\n");
 }
 
+/*
+ * Initialize the parent-child database. Memory is allocated in this
+ * call and assigned to the database
+ *
+ * flow_db [in] Ptr to flow table
+ * num_entries[in] - number of entries to allocate
+ *
+ * Returns 0 on success or negative number on failure.
+ */
+static int32_t
+ulp_flow_db_parent_tbl_init(struct bnxt_ulp_flow_db *flow_db,
+   uint32_t num_entries)
+{
+   struct ulp_fdb_parent_child_db *p_db;
+   uint32_t size, idx;
+
+   /* update the sizes for the allocation */
+   p_db = &flow_db->parent_child_db;
+   p_db->child_bitset_size = (flow_db->flow_tbl.num_flows /
+  sizeof(uint64_t)) + 1; /* size in bytes */
+   p_db->child_bitset_size = ULP_BYTE_ROUND_OFF_8(p_db->child_bitset_size);
+   p_db->entries_count = num_entries;
+
+   /* allocate the memory */
+   p_db->parent_flow_tbl = rte_zmalloc("fdb parent flow tbl",
+   sizeof(struct ulp_fdb_parent_info) *
+   p_db->entries_count, 0);
+   if (!p_db->parent_flow_tbl) {
+   BNXT_TF_DBG(ERR,
+   "Failed to allocate memory fdb parent flow tbl\n");
+   return -ENOMEM;
+   }
+   size = p_db->child_bitset_size * p_db->entries_count;
+
+   /*
+* allocate the big chunk of memory to be statically carved into
+* child_fid_bitset pointer.
+*/
+   p_db->parent_flow_tbl_mem = rte_zmalloc("fdb parent flow tbl mem",
+   size,
+   ULP_BUFFER_ALIGN_64_BYTE);
+   if (!p_db->parent_flow_tbl_mem) {
+   BNXT_TF_DBG(ERR,
+   "Failed to allocate memory fdb parent flow mem\n");
+   return -ENOMEM;
+   }
+
+   /* set the pointers in parent table to their offsets */
+   for (idx = 0 ; idx < p_db->entries_count; idx++) {
+   p_db->parent_flow_tbl[idx].child_fid_bitset =
+   (uint64_t *)&p_db->parent_flow_tbl_mem[idx *
+   p_db->child_bitset_size];
+   }
+   /* success */
+   return 0;
+}
+
+/*
+ * Deinitialize the parent-child database. Memory is deallocated in
+ * this call and all flows should have been purged before this
+ * call.
+ *
+ * flow_db [in] Ptr to flow table
+ *
+ * Returns none
+ */
+static void
+ulp_flow_db_parent_tbl_deinit(struct bnxt_ulp_flow_db *flow_db)
+{
+   /* free the memory related to parent child database */
+   if (flow_db->parent_child_db.parent_flow_tbl_mem) {
+   rte_free(flow_db->parent_child_db.parent_flow_tbl_mem);
+   flow_db->parent_child_db.parent_flow_tbl_mem = NULL;
+   }
+   if (flow_db->parent_child_db.parent_flow_tbl) {
+   rte_free(flow_db->parent_child_db.parent_flow_tbl);
+   flow_db->parent_child_db.p

[dpdk-dev] [PATCH v2 11/12] net/bnxt: remove flow db table type from templates

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

FDB type is now driven by the caller, not the template.
So remove it.

Signed-off-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Kishore Padmanabha 
---
 .../tf_ulp/ulp_template_db_stingray_act.c | 18 ++---
 .../tf_ulp/ulp_template_db_stingray_class.c   | 69 +++
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c | 18 ++---
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 69 +++
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  1 -
 5 files changed, 58 insertions(+), 117 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
index 68e4d8e59..2237ffb94 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
@@ -12,38 +12,32 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_stingray_act_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 0,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 0
},
[2] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 3,
-   .start_tbl_idx = 6,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 6
},
[3] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 3,
-   .start_tbl_idx = 9,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 9
},
[4] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 12,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 12
},
[5] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 18,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 18
},
[6] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 24,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 24
}
 };
 
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
index 1fa364e29..62b940daa 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
@@ -12,140 +12,117 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_stingray_class_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 0,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 0
},
[2] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 6,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 6
},
[3] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 13,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 13
},
[4] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 20,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 20
},
[5] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 1,
-   .start_tbl_idx = 27,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 27
},
[6] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 28,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 28
},
[7] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 33,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 33
},
[8] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 38,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 38
},
[9] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 44,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 44
},
[10] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 50,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 50
},
[11] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 56,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 56
},
[12] = {
.device_name

[dpdk-dev] [PATCH v2 10/12] net/bnxt: consolidate template table processing

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

Name changes due to consolidating the template table processing
and hence are not necessary.

- chip before type in name
- removal of class in key field info

Signed-off-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Kishore Padmanabha 
---
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 12 +++
 .../tf_ulp/ulp_template_db_stingray_act.c |  6 ++--
 .../tf_ulp/ulp_template_db_stingray_class.c   | 10 +++---
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c | 34 +--
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h | 32 -
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c |  6 ++--
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 10 +++---
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  4 +--
 8 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index 812e35c27..cd289cc40 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -256,7 +256,7 @@ ulp_mapper_tbl_list_get(struct bnxt_ulp_mapper_parms 
*mparms,
  *
  * Returns array of Key fields, or NULL on error.
  */
-static struct bnxt_ulp_mapper_class_key_field_info *
+static struct bnxt_ulp_mapper_key_field_info *
 ulp_mapper_key_fields_get(struct bnxt_ulp_mapper_parms *mparms,
  struct bnxt_ulp_mapper_tbl_info *tbl,
  uint32_t *num_flds)
@@ -1009,7 +1009,7 @@ ulp_mapper_result_field_process(struct 
bnxt_ulp_mapper_parms *parms,
 static int32_t
 ulp_mapper_keymask_field_process(struct bnxt_ulp_mapper_parms *parms,
 enum tf_dir dir,
-struct bnxt_ulp_mapper_class_key_field_info *f,
+struct bnxt_ulp_mapper_key_field_info *f,
 struct ulp_blob *blob,
 uint8_t is_key,
 const char *name)
@@ -1020,7 +1020,7 @@ ulp_mapper_keymask_field_process(struct 
bnxt_ulp_mapper_parms *parms,
uint8_t *operand;
struct ulp_regfile *regfile = parms->regfile;
uint8_t *val = NULL;
-   struct bnxt_ulp_mapper_class_key_field_info *fld = f;
+   struct bnxt_ulp_mapper_key_field_info *fld = f;
uint32_t field_size;
 
if (is_key) {
@@ -1442,7 +1442,7 @@ static int32_t
 ulp_mapper_tcam_tbl_process(struct bnxt_ulp_mapper_parms *parms,
struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info   *kflds;
struct ulp_blob key, mask, data, update_data;
uint32_t i, num_kflds;
struct tf *tfp;
@@ -1670,7 +1670,7 @@ static int32_t
 ulp_mapper_em_tbl_process(struct bnxt_ulp_mapper_parms *parms,
  struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info   *kflds;
struct bnxt_ulp_mapper_result_field_info *dflds;
struct ulp_blob key, data;
uint32_t i, num_kflds, num_dflds;
@@ -2061,7 +2061,7 @@ static int32_t
 ulp_mapper_cache_tbl_process(struct bnxt_ulp_mapper_parms *parms,
 struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info *kflds;
struct bnxt_ulp_mapper_cache_entry *cache_entry;
struct bnxt_ulp_mapper_ident_info *idents;
uint32_t i, num_kflds = 0, num_idents = 0;
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
index a5019d664..68e4d8e59 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
@@ -8,7 +8,7 @@
 #include "ulp_template_struct.h"
 #include "ulp_rte_parser.h"
 
-struct bnxt_ulp_mapper_tbl_list_info ulp_act_stingray_tmpl_list[] = {
+struct bnxt_ulp_mapper_tbl_list_info ulp_stingray_act_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
@@ -47,7 +47,7 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_act_stingray_tmpl_list[] = {
}
 };
 
-struct bnxt_ulp_mapper_tbl_info ulp_act_stingray_tbl_list[] = {
+struct bnxt_ulp_mapper_tbl_info ulp_stingray_act_tbl_list[] = {
{
.resource_func = BNXT_ULP_RESOURCE_FUNC_INDEX_TABLE,
.resource_type = TF_TBL_TYPE_ACT_STATS_64,
@@ -531,7 +531,7 @@ struct bnxt_ulp_mapper_tbl_info ulp_act_stingray_tbl_list[] 
= {
}
 };
 
-struct bnxt_ulp_mapper_result_field_info ulp_act_stingray_result_field_list[] 
= {
+struct bnxt_ulp_mapper_result_field_info ulp_stingray_act_result_field_list[] 
= {
{
.field_bit_size = 64,
.result_opcode = BNXT_ULP_MAPPER_OPC_SET_TO_ZERO
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_d

[dpdk-dev] [PATCH v2 12/12] net/bnxt: add parent child flow create and free

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

Added support in the ULP mapper to enable parent child flow
creation and destroy. This feature enables support for the vxlan
decap functionality.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c | 177 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |  36 
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  |  87 -
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |   7 +
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|   5 +-
 5 files changed, 302 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index a1c39329f..3be748908 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -6,10 +6,10 @@
 #include 
 #include "bnxt.h"
 #include "bnxt_tf_common.h"
-#include "ulp_flow_db.h"
 #include "ulp_utils.h"
 #include "ulp_template_struct.h"
 #include "ulp_mapper.h"
+#include "ulp_flow_db.h"
 #include "ulp_fc_mgr.h"
 
 #define ULP_FLOW_DB_RES_DIR_BIT31
@@ -56,10 +56,10 @@ ulp_flow_db_active_flows_bit_set(struct bnxt_ulp_flow_db 
*flow_db,
} else {
if (flow_type == BNXT_ULP_FDB_TYPE_REGULAR)
ULP_INDEX_BITMAP_RESET(f_tbl->active_reg_flows[a_idx],
-idx);
+  idx);
else
ULP_INDEX_BITMAP_RESET(f_tbl->active_dflt_flows[a_idx],
-idx);
+  idx);
}
 }
 
@@ -89,6 +89,13 @@ ulp_flow_db_active_flows_bit_is_set(struct bnxt_ulp_flow_db 
*flow_db,
idx);
 }
 
+static inline enum tf_dir
+ulp_flow_db_resource_dir_get(struct ulp_fdb_resource_info *res_info)
+{
+   return ((res_info->nxt_resource_idx & ULP_FLOW_DB_RES_DIR_MASK) >>
+   ULP_FLOW_DB_RES_DIR_BIT);
+}
+
 static uint8_t
 ulp_flow_db_resource_func_get(struct ulp_fdb_resource_info *res_info)
 {
@@ -157,11 +164,9 @@ ulp_flow_db_res_info_to_params(struct 
ulp_fdb_resource_info *resource_info,
   struct ulp_flow_db_res_params *params)
 {
memset(params, 0, sizeof(struct ulp_flow_db_res_params));
-   params->direction = ((resource_info->nxt_resource_idx &
-ULP_FLOW_DB_RES_DIR_MASK) >>
-ULP_FLOW_DB_RES_DIR_BIT);
 
/* use the helper function to get the resource func */
+   params->direction = ulp_flow_db_resource_dir_get(resource_info);
params->resource_func = ulp_flow_db_resource_func_get(resource_info);
 
if (params->resource_func == BNXT_ULP_RESOURCE_FUNC_EXT_EM_TABLE ||
@@ -303,6 +308,9 @@ ulp_flow_db_parent_tbl_init(struct bnxt_ulp_flow_db 
*flow_db,
struct ulp_fdb_parent_child_db *p_db;
uint32_t size, idx;
 
+   if (!num_entries)
+   return 0;
+
/* update the sizes for the allocation */
p_db = &flow_db->parent_child_db;
p_db->child_bitset_size = (flow_db->flow_tbl.num_flows /
@@ -1171,6 +1179,12 @@ ulp_flow_db_parent_flow_alloc(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
if (p_pdb->parent_flow_tbl[idx].parent_fid == fid) {
@@ -1220,6 +1234,12 @@ ulp_flow_db_parent_flow_free(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
if (p_pdb->parent_flow_tbl[idx].parent_fid == fid) {
@@ -1273,6 +1293,12 @@ ulp_flow_db_parent_child_flow_set(struct 
bnxt_ulp_context *ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
a_idx = child_fid / ULP_INDEX_BITMAP_SIZE;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
@@ -1320,6 +1346,12 @@ ulp_flow_db_parent_flow_idx_get(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child d

Re: [dpdk-dev] [PATCH v5 00/15] fix distributor synchronization issues

2020-10-10 Thread David Marchand
Hello Lukasz,

On Sat, Oct 10, 2020 at 1:26 AM Lukasz Wojciechowski
 wrote:
> W dniu 09.10.2020 o 23:41, Lukasz Wojciechowski pisze:
> More bad news - same issue just appeared on travis for v6.
> Good news we can reproduce it.
>
> Is there a way to delegate a job for travis other way than sending a new 
> patch version?

You just need to fork dpdk in github, then setup travis.
Travis will get triggered on push.
I can help offlist if needed.


-- 
David Marchand



[dpdk-dev] [PATCH v2 00/12] bnxt patches

2020-10-10 Thread Ajit Khaparde
Fixes and enchancements in the bnxt PMD, mostly in the
TRUFLOW layer, including templates to add support for
Stingray device.

v2:
- squashed patch patch 13 to patch 7
- updated and fixed commit logs
- updated docs and release notes where necessary

Kishore Padmanabha (4):
  net/bnxt: fix the corruption of the session details
  net/bnxt: combine default and regular flows
  net/bnxt: add support for parent child flow database
  net/bnxt: add parent child flow create and free

Mike Baucom (6):
  net/bnxt: add multi-device infrastructure
  net/bnxt: add Stingray device support to ULP
  net/bnxt: consolidate template table processing
  net/bnxt: support runtime EM selection
  net/bnxt: consolidate template table processing
  net/bnxt: remove flow db table type from templates

Venkat Duvvuru (2):
  net/bnxt: fix PMD PF support in SR-IOV mode
  net/bnxt: handle default vnic change async event

 doc/guides/nics/bnxt.rst  |42 +
 doc/guides/rel_notes/release_20_11.rst| 1 +
 drivers/net/bnxt/bnxt.h   | 6 +-
 drivers/net/bnxt/bnxt_cpr.c   |13 +-
 drivers/net/bnxt/bnxt_ethdev.c|40 +-
 drivers/net/bnxt/bnxt_hwrm.c  |   463 +-
 drivers/net/bnxt/bnxt_hwrm.h  |12 +-
 drivers/net/bnxt/meson.build  | 4 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c|   387 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h|11 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   | 5 +-
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c   | 5 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c  | 2 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c |   892 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |   179 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  |   520 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |22 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_act.c |  1810 --
 .../net/bnxt/tf_ulp/ulp_template_db_class.c   | 16271 -
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|18 +-
 .../tf_ulp/ulp_template_db_stingray_act.c |  3305 +++
 .../tf_ulp/ulp_template_db_stingray_class.c   | 19005 
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |59 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h |48 +
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c |  3304 +++
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 19005 
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |64 +-
 drivers/net/bnxt/tf_ulp/ulp_utils.h   | 4 +
 28 files changed, 46530 insertions(+), 18967 deletions(-)
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_wh_plus_act.c
 create mode 100644 drivers/net/bnxt/tf_ulp/ulp_template_db_wh_plus_class.c

-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 04/12] net/bnxt: fix PMD PF support in SR-IOV mode

2020-10-10 Thread Ajit Khaparde
From: Venkat Duvvuru 

1. Implement HWRM_FUNC_VF_RESOURCE_CFG command and use it to
   reserve resources for VFs when NEW RM is enabled.
2. Invoke PF’s FUNC_CFG before configuring VFs resources.
3. Don’t consider max_rx_em_flows in max_l2_ctx calculation
   when VFs are configured.
4. Issue HWRM_FUNC_QCFG instead of HWRM_FUNC_QCAPS to find
   out the actual allocated resources for VF.
5. Don’t add random mac to the VF.
6. Handle completion type CMPL_BASE_TYPE_HWRM_FWD_REQ instead
   of CMPL_BASE_TYPE_HWRM_FWD_RESP.
7. Don't enable HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_FWD_NONE_MODE
   when the list of HWRM commands that needs to be forwarded
   to the PF is specified in HWRM_FUNC_DRV_RGTR.
8. Update the HWRM commands list that can be forwared to the
   PF.

Fixes: b7778e8a1c00 ("net/bnxt: refactor to properly allocate resources for 
PF/VF")
Cc: sta...@dpdk.org

Signed-off-by: Venkat Duvvuru 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|   6 +-
 drivers/net/bnxt/bnxt_cpr.c|   6 +-
 drivers/net/bnxt/bnxt_ethdev.c |  40 +--
 drivers/net/bnxt/bnxt_hwrm.c   | 461 -
 drivers/net/bnxt/bnxt_hwrm.h   |  12 +-
 5 files changed, 309 insertions(+), 216 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index eca74486e..a951bca7a 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -167,6 +167,9 @@
 #defineBNXT_DEFAULT_VNIC_CHANGE_VF_ID_SFT  \
HWRM_ASYNC_EVENT_CMPL_DEFAULT_VNIC_CHANGE_EVENT_DATA1_VF_ID_SFT
 
+#define BNXT_HWRM_CMD_TO_FORWARD(cmd)  \
+   (bp->pf->vf_req_fwd[(cmd) / 32] |= (1 << ((cmd) % 32)))
+
 struct bnxt_led_info {
uint8_t  num_leds;
uint8_t  led_id;
@@ -664,9 +667,10 @@ struct bnxt {
 #define BNXT_FW_CAP_IF_CHANGE  BIT(1)
 #define BNXT_FW_CAP_ERROR_RECOVERY BIT(2)
 #define BNXT_FW_CAP_ERR_RECOVER_RELOAD BIT(3)
+#define BNXT_FW_CAP_HCOMM_FW_STATUSBIT(4)
 #define BNXT_FW_CAP_ADV_FLOW_MGMT  BIT(5)
 #define BNXT_FW_CAP_ADV_FLOW_COUNTERS  BIT(6)
-#define BNXT_FW_CAP_HCOMM_FW_STATUSBIT(7)
+#define BNXT_FW_CAP_LINK_ADMIN BIT(7)
 
pthread_mutex_t flow_lock;
 
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index a3a7e6ab7..54923948f 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -239,7 +239,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
goto reject;
}
 
-   if (bnxt_rcv_msg_from_vf(bp, vf_id, fwd_cmd) == true) {
+   if (bnxt_rcv_msg_from_vf(bp, vf_id, fwd_cmd)) {
/*
 * In older firmware versions, the MAC had to be all zeros for
 * the VF to set it's MAC via hwrm_func_vf_cfg. Set to all
@@ -254,6 +254,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
(const uint8_t *)"\x00\x00\x00\x00\x00");
}
}
+
if (fwd_cmd->req_type == HWRM_CFA_L2_SET_RX_MASK) {
struct hwrm_cfa_l2_set_rx_mask_input *srm =
(void *)fwd_cmd;
@@ -265,6 +266,7 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmpl)
HWRM_CFA_L2_SET_RX_MASK_INPUT_MASK_VLAN_NONVLAN |
HWRM_CFA_L2_SET_RX_MASK_INPUT_MASK_ANYVLAN_NONVLAN);
}
+
/* Forward */
rc = bnxt_hwrm_exec_fwd_resp(bp, fw_vf_id, fwd_cmd, req_len);
if (rc) {
@@ -306,7 +308,7 @@ int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct 
cmpl_base *cmp)
bnxt_handle_async_event(bp, cmp);
evt = 1;
break;
-   case CMPL_BASE_TYPE_HWRM_FWD_RESP:
+   case CMPL_BASE_TYPE_HWRM_FWD_REQ:
/* Handle HWRM forwarded responses */
bnxt_handle_fwd_req(bp, cmp);
evt = 1;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 8b63134c3..b4654ec6a 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -5208,37 +5208,14 @@ static void bnxt_config_vf_req_fwd(struct bnxt *bp)
if (!BNXT_PF(bp))
return;
 
-#define ALLOW_FUNC(x)  \
-   { \
-   uint32_t arg = (x); \
-   bp->pf->vf_req_fwd[((arg) >> 5)] &= \
-   ~rte_cpu_to_le_32(1 << ((arg) & 0x1f)); \
-   }
-
-   /* Forward all requests if firmware is new enough */
-   if (((bp->fw_ver >= ((20 << 24) | (6 << 16) | (100 << 8))) &&
-(bp->fw_ver < ((20 << 24) | (7 << 16 ||
-   ((bp->fw_ver >= ((20 << 24) | (8 << 16) {
-   memset(bp->pf->vf_req_fwd, 0xff, sizeof(bp->pf->vf_req_fwd));
-   } else {
-   PMD_DRV_LOG(WARNING,
-   "Firmware too old for VF mailbox functional

[dpdk-dev] [PATCH v2 01/12] net/bnxt: fix the corruption of the session details

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

The session details that is shared among multiple ports
need to be outside the bnxt structure.

Fixes: 70e64b27af5b ("net/bnxt: support ULP session manager cleanup")
Cc: sta...@dpdk.org

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index 289619411..a4d48c71a 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -159,7 +159,9 @@ ulp_ctx_session_open(struct bnxt *bp,
}
if (!session->session_opened) {
session->session_opened = 1;
-   session->g_tfp = &bp->tfp;
+   session->g_tfp = rte_zmalloc("bnxt_ulp_session_tfp",
+sizeof(struct tf), 0);
+   session->g_tfp->session = bp->tfp.session;
}
return rc;
 }
@@ -176,6 +178,7 @@ ulp_ctx_session_close(struct bnxt *bp,
if (session->session_opened)
tf_close_session(&bp->tfp);
session->session_opened = 0;
+   rte_free(session->g_tfp);
session->g_tfp = NULL;
 }
 
-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 05/12] net/bnxt: consolidate template table processing

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

The table processing has been consolidated to be able to reuse the same
code for action and classification template processing.

Signed-off-by: Mike Baucom 
Reviewed-by: Kishore Padmanabha 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   |   1 +
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c   |   1 +
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 298 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |   4 +-
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|   6 +
 .../tf_ulp/ulp_template_db_stingray_class.c   |   2 +-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |  56 +++-
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h |   4 +-
 .../tf_ulp/ulp_template_db_wh_plus_class.c|   2 +-
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  11 +-
 10 files changed, 144 insertions(+), 241 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
index 566e1254a..eea39f6b7 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
@@ -147,6 +147,7 @@ bnxt_ulp_flow_create(struct rte_eth_dev *dev,
mapper_cparms.act_prop = ¶ms.act_prop;
mapper_cparms.class_tid = class_id;
mapper_cparms.act_tid = act_tmpl;
+   mapper_cparms.flow_type = BNXT_ULP_FDB_TYPE_REGULAR;
 
/* Get the function id */
if (ulp_port_db_port_func_id_get(ulp_ctx,
diff --git a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c 
b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
index 8dea235f0..01f4fd087 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
@@ -351,6 +351,7 @@ ulp_default_flow_create(struct rte_eth_dev *eth_dev,
}
 
mapper_params.class_tid = ulp_class_tid;
+   mapper_params.flow_type = BNXT_ULP_FDB_TYPE_DEFAULT;
 
rc = ulp_mapper_flow_create(ulp_ctx, &mapper_params, flow_id);
if (rc) {
diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index 44a29629b..5ed481ab3 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -216,37 +216,6 @@ ulp_mapper_act_prop_size_get(uint32_t idx)
return ulp_act_prop_map_table[idx];
 }
 
-/*
- * Get the list of result fields that implement the flow action.
- * Gets a device dependent list of tables that implement the action template 
id.
- *
- * mparms [in] The mappers parms with data related to the flow.
- *
- * tid [in] The action template id that matches the flow
- *
- * num_tbls [out] The number of action tables in the returned array
- *
- * Returns An array of action tables to implement the flow, or NULL on error.
- */
-static struct bnxt_ulp_mapper_tbl_info *
-ulp_mapper_action_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
-  uint32_t tid,
-  uint32_t *num_tbls)
-{
-   uint32_tidx;
-   const struct ulp_template_device_tbls *dev_tbls;
-
-   dev_tbls = mparms->device_params->dev_tbls;
-
-   /* NOTE: Need to have something from template compiler to help validate
-* range of dev_id and act_tid
-*/
-   idx = dev_tbls->act_tmpl_list[tid].start_tbl_idx;
-   *num_tbls = dev_tbls->act_tmpl_list[tid].num_tbls;
-
-   return &dev_tbls->act_tbl_list[idx];
-}
-
 /*
  * Get a list of classifier tables that implement the flow
  * Gets a device dependent list of tables that implement the class template id
@@ -257,30 +226,23 @@ ulp_mapper_action_tbl_list_get(struct 
bnxt_ulp_mapper_parms *mparms,
  *
  * num_tbls [out] The number of classifier tables in the returned array
  *
- * fdb_tbl_idx [out] The flow database index Regular or default
- *
  * returns An array of classifier tables to implement the flow, or NULL on
  * error
  */
 static struct bnxt_ulp_mapper_tbl_info *
-ulp_mapper_class_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
- uint32_t tid,
- uint32_t *num_tbls,
- uint32_t *fdb_tbl_idx)
+ulp_mapper_tbl_list_get(struct bnxt_ulp_mapper_parms *mparms,
+   uint32_t tid,
+   uint32_t *num_tbls)
 {
uint32_t idx;
const struct ulp_template_device_tbls *dev_tbls;
 
-   dev_tbls = mparms->device_params->dev_tbls;
+   dev_tbls = &mparms->device_params->dev_tbls[mparms->tmpl_type];
 
-   /* NOTE: Need to have something from template compiler to help validate
-* range of dev_id and tid
-*/
-   idx = dev_tbls->class_tmpl_list[tid].start_tbl_idx;
-   *num_tbls = dev_tbls->class_tmpl_list[tid].num_tbls;
-   *fdb_tbl_idx = dev_tbls->class_tmpl_list[tid].flow_db_table_type;
+   idx = dev_tbls->tmpl_list[tid].start_tbl_idx;
+   *num_tbls = dev_tbls->tmpl_list[tid].num_tbls;
 
-   return &dev_tbls->class_tbl_list[idx];
+   return &dev_tbls->tbl_list[idx];
 }
 
 /*
@@ -302,1

[dpdk-dev] [PATCH v2 09/12] net/bnxt: add support for parent child flow database

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

Added support for parent child flow database apis. This
feature adds support to enable vxlan decap support where
flows needs to maintain parent-child flow relationship.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c | 348 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |  84 +
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c |   1 +
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |   1 +
 drivers/net/bnxt/tf_ulp/ulp_utils.h   |   4 +
 5 files changed, 435 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index da012451d..a1c39329f 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -207,13 +207,16 @@ ulp_flow_db_alloc_resource(struct bnxt_ulp_flow_db 
*flow_db)
return -ENOMEM;
}
size = (flow_tbl->num_flows / sizeof(uint64_t)) + 1;
-   flow_tbl->active_reg_flows = rte_zmalloc("active reg flows", size, 0);
+   size =  ULP_BYTE_ROUND_OFF_8(size);
+   flow_tbl->active_reg_flows = rte_zmalloc("active reg flows", size,
+ULP_BUFFER_ALIGN_64_BYTE);
if (!flow_tbl->active_reg_flows) {
BNXT_TF_DBG(ERR, "Failed to alloc memory active reg flows\n");
return -ENOMEM;
}
 
-   flow_tbl->active_dflt_flows = rte_zmalloc("active dflt flows", size, 0);
+   flow_tbl->active_dflt_flows = rte_zmalloc("active dflt flows", size,
+ ULP_BUFFER_ALIGN_64_BYTE);
if (!flow_tbl->active_dflt_flows) {
BNXT_TF_DBG(ERR, "Failed to alloc memory active dflt flows\n");
return -ENOMEM;
@@ -284,6 +287,86 @@ ulp_flow_db_func_id_set(struct bnxt_ulp_flow_db *flow_db,
BNXT_TF_DBG(ERR, "Invalid flow id, flowdb corrupt\n");
 }
 
+/*
+ * Initialize the parent-child database. Memory is allocated in this
+ * call and assigned to the database
+ *
+ * flow_db [in] Ptr to flow table
+ * num_entries[in] - number of entries to allocate
+ *
+ * Returns 0 on success or negative number on failure.
+ */
+static int32_t
+ulp_flow_db_parent_tbl_init(struct bnxt_ulp_flow_db *flow_db,
+   uint32_t num_entries)
+{
+   struct ulp_fdb_parent_child_db *p_db;
+   uint32_t size, idx;
+
+   /* update the sizes for the allocation */
+   p_db = &flow_db->parent_child_db;
+   p_db->child_bitset_size = (flow_db->flow_tbl.num_flows /
+  sizeof(uint64_t)) + 1; /* size in bytes */
+   p_db->child_bitset_size = ULP_BYTE_ROUND_OFF_8(p_db->child_bitset_size);
+   p_db->entries_count = num_entries;
+
+   /* allocate the memory */
+   p_db->parent_flow_tbl = rte_zmalloc("fdb parent flow tbl",
+   sizeof(struct ulp_fdb_parent_info) *
+   p_db->entries_count, 0);
+   if (!p_db->parent_flow_tbl) {
+   BNXT_TF_DBG(ERR,
+   "Failed to allocate memory fdb parent flow tbl\n");
+   return -ENOMEM;
+   }
+   size = p_db->child_bitset_size * p_db->entries_count;
+
+   /*
+* allocate the big chunk of memory to be statically carved into
+* child_fid_bitset pointer.
+*/
+   p_db->parent_flow_tbl_mem = rte_zmalloc("fdb parent flow tbl mem",
+   size,
+   ULP_BUFFER_ALIGN_64_BYTE);
+   if (!p_db->parent_flow_tbl_mem) {
+   BNXT_TF_DBG(ERR,
+   "Failed to allocate memory fdb parent flow mem\n");
+   return -ENOMEM;
+   }
+
+   /* set the pointers in parent table to their offsets */
+   for (idx = 0 ; idx < p_db->entries_count; idx++) {
+   p_db->parent_flow_tbl[idx].child_fid_bitset =
+   (uint64_t *)&p_db->parent_flow_tbl_mem[idx *
+   p_db->child_bitset_size];
+   }
+   /* success */
+   return 0;
+}
+
+/*
+ * Deinitialize the parent-child database. Memory is deallocated in
+ * this call and all flows should have been purged before this
+ * call.
+ *
+ * flow_db [in] Ptr to flow table
+ *
+ * Returns none
+ */
+static void
+ulp_flow_db_parent_tbl_deinit(struct bnxt_ulp_flow_db *flow_db)
+{
+   /* free the memory related to parent child database */
+   if (flow_db->parent_child_db.parent_flow_tbl_mem) {
+   rte_free(flow_db->parent_child_db.parent_flow_tbl_mem);
+   flow_db->parent_child_db.parent_flow_tbl_mem = NULL;
+   }
+   if (flow_db->parent_child_db.parent_flow_tbl) {
+   rte_free(flow_db->parent_child_db.parent_flow_tbl);
+   flow_db->parent_child_db.p

Re: [dpdk-dev] [PATCH v5 00/15] fix distributor synchronization issues

2020-10-10 Thread David Marchand
On Sat, Oct 10, 2020 at 10:12 AM David Marchand
 wrote:
>
> Hello Lukasz,
>
> On Sat, Oct 10, 2020 at 1:26 AM Lukasz Wojciechowski
>  wrote:
> > W dniu 09.10.2020 o 23:41, Lukasz Wojciechowski pisze:
> > More bad news - same issue just appeared on travis for v6.
> > Good news we can reproduce it.
> >
> > Is there a way to delegate a job for travis other way than sending a new 
> > patch version?
>
> You just need to fork dpdk in github, then setup travis.

Forgot to paste it:
https://docs.travis-ci.com/user/tutorial/#to-get-started-with-travis-ci-using-github

> Travis will get triggered on push.
> I can help offlist if needed.


-- 
David Marchand



[dpdk-dev] [PATCH v2 07/12] net/bnxt: handle default vnic change async event

2020-10-10 Thread Ajit Khaparde
From: Venkat Duvvuru 

Currently, we are only registering to this event if the function
is a trusted VF. This patch extends it for PFs as well.

Fixes: 322bd6e70272 ("net/bnxt: add port representor infrastructure")
Cc: sta...@dpdk.org

Signed-off-by: Venkat Duvvuru 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_cpr.c  | 7 ++-
 drivers/net/bnxt/bnxt_hwrm.c | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index 54923948f..91d1ffe46 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -50,7 +50,7 @@ static void
 bnxt_process_default_vnic_change(struct bnxt *bp,
 struct hwrm_async_event_cmpl *async_cmp)
 {
-   uint16_t fid, vnic_state, parent_id, vf_fid, vf_id;
+   uint16_t vnic_state, vf_fid, vf_id;
struct bnxt_representor *vf_rep_bp;
struct rte_eth_dev *eth_dev;
bool vfr_found = false;
@@ -67,10 +67,7 @@ bnxt_process_default_vnic_change(struct bnxt *bp,
if (vnic_state != BNXT_DEFAULT_VNIC_ALLOC)
return;
 
-   parent_id = (event_data & BNXT_DEFAULT_VNIC_CHANGE_PF_ID_MASK) >>
-   BNXT_DEFAULT_VNIC_CHANGE_PF_ID_SFT;
-   fid = BNXT_PF(bp) ? bp->fw_fid : bp->parent->fid;
-   if (parent_id != fid || !bp->rep_info)
+   if (!bp->rep_info)
return;
 
vf_fid = (event_data & BNXT_DEFAULT_VNIC_CHANGE_VF_ID_MASK) >>
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 8133afc74..eef282b69 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -938,7 +938,7 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp)
req.async_event_fwd[1] |=
rte_cpu_to_le_32(ASYNC_CMPL_EVENT_ID_DBG_NOTIFICATION);
 
-   if (BNXT_VF_IS_TRUSTED(bp))
+   if (BNXT_PF(bp) || BNXT_VF_IS_TRUSTED(bp))
req.async_event_fwd[1] |=
rte_cpu_to_le_32(ASYNC_CMPL_EVENT_ID_DEFAULT_VNIC_CHANGE);
 
-- 
2.21.1 (Apple Git-122.3)



[dpdk-dev] [PATCH v2 06/12] net/bnxt: combine default and regular flows

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

The default and regular flows are stored in the same flow table
instead of different flow tables. This should help code reuse
and reducing the number of allocations.
So combine default and regular flows in flow database.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c  |   2 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c|   2 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c   | 423 +++-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h   |  75 ++---
 drivers/net/bnxt/tf_ulp/ulp_mapper.c|  33 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h|  11 +-
 8 files changed, 259 insertions(+), 295 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index eeda2d033..9ed92a88d 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -853,7 +853,7 @@ bnxt_ulp_deinit(struct bnxt *bp,
bnxt_ulp_destroy_vfr_default_rules(bp, true);
 
/* clean up regular flows */
-   ulp_flow_db_flush_flows(bp->ulp_ctx, BNXT_ULP_REGULAR_FLOW_TABLE);
+   ulp_flow_db_flush_flows(bp->ulp_ctx, BNXT_ULP_FDB_TYPE_REGULAR);
 
/* cleanup the eem table scope */
ulp_eem_tbl_scope_deinit(bp, bp->ulp_ctx);
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
index eea39f6b7..c7b29824e 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c
@@ -281,8 +281,8 @@ bnxt_ulp_flow_destroy(struct rte_eth_dev *dev,
return -EINVAL;
}
 
-   ret = ulp_mapper_flow_destroy(ulp_ctx, flow_id,
- BNXT_ULP_REGULAR_FLOW_TABLE);
+   ret = ulp_mapper_flow_destroy(ulp_ctx, BNXT_ULP_FDB_TYPE_REGULAR,
+ flow_id);
if (ret) {
BNXT_TF_DBG(ERR, "Failed to destroy flow.\n");
if (error)
diff --git a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c 
b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
index 01f4fd087..c36d4d4c4 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
@@ -391,8 +391,8 @@ ulp_default_flow_destroy(struct rte_eth_dev *eth_dev, 
uint32_t flow_id)
return rc;
}
 
-   rc = ulp_mapper_flow_destroy(ulp_ctx, flow_id,
-BNXT_ULP_DEFAULT_FLOW_TABLE);
+   rc = ulp_mapper_flow_destroy(ulp_ctx, BNXT_ULP_FDB_TYPE_DEFAULT,
+flow_id);
if (rc)
BNXT_TF_DBG(ERR, "Failed to destroy flow.\n");
 
diff --git a/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c 
b/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
index 5a0bf602a..051ebac04 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c
@@ -561,7 +561,7 @@ int ulp_fc_mgr_query_count_get(struct bnxt_ulp_context 
*ctxt,
 
do {
rc = ulp_flow_db_resource_get(ctxt,
- BNXT_ULP_REGULAR_FLOW_TABLE,
+ BNXT_ULP_FDB_TYPE_REGULAR,
  flow_id,
  &nxt_resource_index,
  ¶ms);
diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index 9a2d3758d..0a3fb015c 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -27,49 +27,66 @@
 #define ULP_FLOW_DB_RES_NXT_RESET(dst) ((dst) &= ~(ULP_FLOW_DB_RES_NXT_MASK))
 
 /*
- * Helper function to set the bit in the active flow table
+ * Helper function to set the bit in the active flows
  * No validation is done in this function.
  *
- * flow_tbl [in] Ptr to flow table
+ * flow_db [in] Ptr to flow database
+ * flow_type [in] - specify default or regular
  * idx [in] The index to bit to be set or reset.
  * flag [in] 1 to set and 0 to reset.
  *
  * returns none
  */
 static void
-ulp_flow_db_active_flow_set(struct bnxt_ulp_flow_tbl   *flow_tbl,
-   uint32_tidx,
-   uint32_tflag)
+ulp_flow_db_active_flows_bit_set(struct bnxt_ulp_flow_db *flow_db,
+enum bnxt_ulp_fdb_type flow_type,
+uint32_t idx,
+uint32_t flag)
 {
-   uint32_tactive_index;
-
-   active_index = idx / ULP_INDEX_BITMAP_SIZE;
-   if (flag)
-   ULP_INDEX_BITMAP_SET(flow_tbl->active_flow_tbl[active_index],
-idx);
-   else
-   ULP_INDEX_BITMAP_RESET(flow_tbl->active_flow_tbl[active_index],
-  idx);
+  

[dpdk-dev] [PATCH v2 11/12] net/bnxt: remove flow db table type from templates

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

FDB type is now driven by the caller, not the template.
So remove it.

Signed-off-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Kishore Padmanabha 
---
 .../tf_ulp/ulp_template_db_stingray_act.c | 18 ++---
 .../tf_ulp/ulp_template_db_stingray_class.c   | 69 +++
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c | 18 ++---
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 69 +++
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  1 -
 5 files changed, 58 insertions(+), 117 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
index 68e4d8e59..2237ffb94 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
@@ -12,38 +12,32 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_stingray_act_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 0,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 0
},
[2] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 3,
-   .start_tbl_idx = 6,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 6
},
[3] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 3,
-   .start_tbl_idx = 9,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 9
},
[4] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 12,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 12
},
[5] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 18,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 18
},
[6] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 24,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 24
}
 };
 
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
index 1fa364e29..62b940daa 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_class.c
@@ -12,140 +12,117 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_stingray_class_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 0,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 0
},
[2] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 6,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 6
},
[3] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 13,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 13
},
[4] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 7,
-   .start_tbl_idx = 20,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 20
},
[5] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 1,
-   .start_tbl_idx = 27,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_DEFAULT
+   .start_tbl_idx = 27
},
[6] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 28,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 28
},
[7] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 5,
-   .start_tbl_idx = 33,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 33
},
[8] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 38,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 38
},
[9] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 44,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 44
},
[10] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 50,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 50
},
[11] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
-   .start_tbl_idx = 56,
-   .flow_db_table_type = BNXT_ULP_FDB_TYPE_REGULAR
+   .start_tbl_idx = 56
},
[12] = {
.device_name

[dpdk-dev] [PATCH v2 12/12] net/bnxt: add parent child flow create and free

2020-10-10 Thread Ajit Khaparde
From: Kishore Padmanabha 

Added support in the ULP mapper to enable parent child flow
creation and destroy. This feature enables support for the vxlan
decap functionality.

Signed-off-by: Kishore Padmanabha 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c | 177 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |  36 
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  |  87 -
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |   7 +
 .../net/bnxt/tf_ulp/ulp_template_db_enum.h|   5 +-
 5 files changed, 302 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c 
b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index a1c39329f..3be748908 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -6,10 +6,10 @@
 #include 
 #include "bnxt.h"
 #include "bnxt_tf_common.h"
-#include "ulp_flow_db.h"
 #include "ulp_utils.h"
 #include "ulp_template_struct.h"
 #include "ulp_mapper.h"
+#include "ulp_flow_db.h"
 #include "ulp_fc_mgr.h"
 
 #define ULP_FLOW_DB_RES_DIR_BIT31
@@ -56,10 +56,10 @@ ulp_flow_db_active_flows_bit_set(struct bnxt_ulp_flow_db 
*flow_db,
} else {
if (flow_type == BNXT_ULP_FDB_TYPE_REGULAR)
ULP_INDEX_BITMAP_RESET(f_tbl->active_reg_flows[a_idx],
-idx);
+  idx);
else
ULP_INDEX_BITMAP_RESET(f_tbl->active_dflt_flows[a_idx],
-idx);
+  idx);
}
 }
 
@@ -89,6 +89,13 @@ ulp_flow_db_active_flows_bit_is_set(struct bnxt_ulp_flow_db 
*flow_db,
idx);
 }
 
+static inline enum tf_dir
+ulp_flow_db_resource_dir_get(struct ulp_fdb_resource_info *res_info)
+{
+   return ((res_info->nxt_resource_idx & ULP_FLOW_DB_RES_DIR_MASK) >>
+   ULP_FLOW_DB_RES_DIR_BIT);
+}
+
 static uint8_t
 ulp_flow_db_resource_func_get(struct ulp_fdb_resource_info *res_info)
 {
@@ -157,11 +164,9 @@ ulp_flow_db_res_info_to_params(struct 
ulp_fdb_resource_info *resource_info,
   struct ulp_flow_db_res_params *params)
 {
memset(params, 0, sizeof(struct ulp_flow_db_res_params));
-   params->direction = ((resource_info->nxt_resource_idx &
-ULP_FLOW_DB_RES_DIR_MASK) >>
-ULP_FLOW_DB_RES_DIR_BIT);
 
/* use the helper function to get the resource func */
+   params->direction = ulp_flow_db_resource_dir_get(resource_info);
params->resource_func = ulp_flow_db_resource_func_get(resource_info);
 
if (params->resource_func == BNXT_ULP_RESOURCE_FUNC_EXT_EM_TABLE ||
@@ -303,6 +308,9 @@ ulp_flow_db_parent_tbl_init(struct bnxt_ulp_flow_db 
*flow_db,
struct ulp_fdb_parent_child_db *p_db;
uint32_t size, idx;
 
+   if (!num_entries)
+   return 0;
+
/* update the sizes for the allocation */
p_db = &flow_db->parent_child_db;
p_db->child_bitset_size = (flow_db->flow_tbl.num_flows /
@@ -1171,6 +1179,12 @@ ulp_flow_db_parent_flow_alloc(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
if (p_pdb->parent_flow_tbl[idx].parent_fid == fid) {
@@ -1220,6 +1234,12 @@ ulp_flow_db_parent_flow_free(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
if (p_pdb->parent_flow_tbl[idx].parent_fid == fid) {
@@ -1273,6 +1293,12 @@ ulp_flow_db_parent_child_flow_set(struct 
bnxt_ulp_context *ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child db then just exit */
+   if (!flow_db->parent_child_db.entries_count) {
+   BNXT_TF_DBG(ERR, "parent child db not supported\n");
+   return -EINVAL;
+   }
+
p_pdb = &flow_db->parent_child_db;
a_idx = child_fid / ULP_INDEX_BITMAP_SIZE;
for (idx = 0; idx <= p_pdb->entries_count; idx++) {
@@ -1320,6 +1346,12 @@ ulp_flow_db_parent_flow_idx_get(struct bnxt_ulp_context 
*ulp_ctxt,
return -EINVAL;
}
 
+   /* No support for parent child d

[dpdk-dev] [PATCH v2 10/12] net/bnxt: consolidate template table processing

2020-10-10 Thread Ajit Khaparde
From: Mike Baucom 

Name changes due to consolidating the template table processing
and hence are not necessary.

- chip before type in name
- removal of class in key field info

Signed-off-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Kishore Padmanabha 
---
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 12 +++
 .../tf_ulp/ulp_template_db_stingray_act.c |  6 ++--
 .../tf_ulp/ulp_template_db_stingray_class.c   | 10 +++---
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.c | 34 +--
 drivers/net/bnxt/tf_ulp/ulp_template_db_tbl.h | 32 -
 .../bnxt/tf_ulp/ulp_template_db_wh_plus_act.c |  6 ++--
 .../tf_ulp/ulp_template_db_wh_plus_class.c| 10 +++---
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  4 +--
 8 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index 812e35c27..cd289cc40 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -256,7 +256,7 @@ ulp_mapper_tbl_list_get(struct bnxt_ulp_mapper_parms 
*mparms,
  *
  * Returns array of Key fields, or NULL on error.
  */
-static struct bnxt_ulp_mapper_class_key_field_info *
+static struct bnxt_ulp_mapper_key_field_info *
 ulp_mapper_key_fields_get(struct bnxt_ulp_mapper_parms *mparms,
  struct bnxt_ulp_mapper_tbl_info *tbl,
  uint32_t *num_flds)
@@ -1009,7 +1009,7 @@ ulp_mapper_result_field_process(struct 
bnxt_ulp_mapper_parms *parms,
 static int32_t
 ulp_mapper_keymask_field_process(struct bnxt_ulp_mapper_parms *parms,
 enum tf_dir dir,
-struct bnxt_ulp_mapper_class_key_field_info *f,
+struct bnxt_ulp_mapper_key_field_info *f,
 struct ulp_blob *blob,
 uint8_t is_key,
 const char *name)
@@ -1020,7 +1020,7 @@ ulp_mapper_keymask_field_process(struct 
bnxt_ulp_mapper_parms *parms,
uint8_t *operand;
struct ulp_regfile *regfile = parms->regfile;
uint8_t *val = NULL;
-   struct bnxt_ulp_mapper_class_key_field_info *fld = f;
+   struct bnxt_ulp_mapper_key_field_info *fld = f;
uint32_t field_size;
 
if (is_key) {
@@ -1442,7 +1442,7 @@ static int32_t
 ulp_mapper_tcam_tbl_process(struct bnxt_ulp_mapper_parms *parms,
struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info   *kflds;
struct ulp_blob key, mask, data, update_data;
uint32_t i, num_kflds;
struct tf *tfp;
@@ -1670,7 +1670,7 @@ static int32_t
 ulp_mapper_em_tbl_process(struct bnxt_ulp_mapper_parms *parms,
  struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info   *kflds;
struct bnxt_ulp_mapper_result_field_info *dflds;
struct ulp_blob key, data;
uint32_t i, num_kflds, num_dflds;
@@ -2061,7 +2061,7 @@ static int32_t
 ulp_mapper_cache_tbl_process(struct bnxt_ulp_mapper_parms *parms,
 struct bnxt_ulp_mapper_tbl_info *tbl)
 {
-   struct bnxt_ulp_mapper_class_key_field_info *kflds;
+   struct bnxt_ulp_mapper_key_field_info *kflds;
struct bnxt_ulp_mapper_cache_entry *cache_entry;
struct bnxt_ulp_mapper_ident_info *idents;
uint32_t i, num_kflds = 0, num_idents = 0;
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c 
b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
index a5019d664..68e4d8e59 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_db_stingray_act.c
@@ -8,7 +8,7 @@
 #include "ulp_template_struct.h"
 #include "ulp_rte_parser.h"
 
-struct bnxt_ulp_mapper_tbl_list_info ulp_act_stingray_tmpl_list[] = {
+struct bnxt_ulp_mapper_tbl_list_info ulp_stingray_act_tmpl_list[] = {
[1] = {
.device_name = BNXT_ULP_DEVICE_ID_STINGRAY,
.num_tbls = 6,
@@ -47,7 +47,7 @@ struct bnxt_ulp_mapper_tbl_list_info 
ulp_act_stingray_tmpl_list[] = {
}
 };
 
-struct bnxt_ulp_mapper_tbl_info ulp_act_stingray_tbl_list[] = {
+struct bnxt_ulp_mapper_tbl_info ulp_stingray_act_tbl_list[] = {
{
.resource_func = BNXT_ULP_RESOURCE_FUNC_INDEX_TABLE,
.resource_type = TF_TBL_TYPE_ACT_STATS_64,
@@ -531,7 +531,7 @@ struct bnxt_ulp_mapper_tbl_info ulp_act_stingray_tbl_list[] 
= {
}
 };
 
-struct bnxt_ulp_mapper_result_field_info ulp_act_stingray_result_field_list[] 
= {
+struct bnxt_ulp_mapper_result_field_info ulp_stingray_act_result_field_list[] 
= {
{
.field_bit_size = 64,
.result_opcode = BNXT_ULP_MAPPER_OPC_SET_TO_ZERO
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_d

[dpdk-dev] [PATCH v3 2/2] net/mlx5: add non temporal store for WQE fields

2020-10-10 Thread Aman Kumar
add non temporal store for few WQE fields to optimize
data path. Define RTE_LIBRTE_MLX5_NT_STORE in build
configurations to enable this optimization.

Signed-off-by: Aman Kumar 
---
 drivers/net/mlx5/meson.build |   1 +
 drivers/net/mlx5/mlx5.c  |  17 ++
 drivers/net/mlx5/mlx5.h  |   4 +
 drivers/net/mlx5/mlx5_rxq.c  |   3 +
 drivers/net/mlx5/mlx5_rxtx.c | 322 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   6 +
 drivers/net/mlx5/mlx5_rxtx_vec.h |  29 ++-
 drivers/net/mlx5/mlx5_txq.c  |   3 +
 meson_options.txt|   2 +
 9 files changed, 378 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 38e93fdc1..347ca6527 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -48,6 +48,7 @@ foreach option:cflags_options
endif
 endforeach
 dpdk_conf.set('RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY', 
get_option('mlx5_ntload_tstore'))
+dpdk_conf.set('RTE_LIBRTE_MLX5_NT_STORE', get_option('mlx5_ntstore'))
 if get_option('buildtype').contains('debug')
cflags += [ '-pedantic', '-DPEDANTIC' ]
 else
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a2796eaa5..01b25a109 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -164,6 +164,13 @@
 /* mprq_tstore_memcpy */
 #define MLX5_MPRQ_TSTORE_MEMCPY "mprq_tstore_memcpy"
 #endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+/* tx_wqe_field_ntstore */
+#define MLX5_TX_WQE_FIELD_NTSTORE "tx_wqe_field_ntstore"
+
+/* vec_rx_wqe_field_ntstore */
+#define MLX5_VEC_RX_WQE_FIELD_NTSTORE "vec_rx_wqe_field_ntstore"
+#endif
 
 /*
  * Device parameter to configure the total data buffer size for a single
@@ -1631,6 +1638,12 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
} else if (strcmp(MLX5_MPRQ_TSTORE_MEMCPY, key) == 0) {
config->mprq_tstore_memcpy = tmp;
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   } else if (strcmp(MLX5_TX_WQE_FIELD_NTSTORE, key) == 0) {
+   config->tx_wqe_field_ntstore = tmp;
+   } else if (strcmp(MLX5_VEC_RX_WQE_FIELD_NTSTORE, key) == 0) {
+   config->vec_rx_wqe_field_ntstore = tmp;
 #endif
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
@@ -1694,6 +1707,10 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
MLX5_DECAP_EN,
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
MLX5_MPRQ_TSTORE_MEMCPY,
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   MLX5_TX_WQE_FIELD_NTSTORE,
+   MLX5_VEC_RX_WQE_FIELD_NTSTORE,
 #endif
NULL,
};
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1eb305650..9d192465f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -237,6 +237,10 @@ struct mlx5_dev_config {
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
unsigned int mprq_tstore_memcpy:1;
 #endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   unsigned int tx_wqe_field_ntstore:1;
+   unsigned int vec_rx_wqe_field_ntstore:1;
+#endif
 };
 
 
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c8db59a12..69ad9ab8c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1382,6 +1382,9 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
tmpl->irq = 1;
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
tmpl->rxq.mprq_tstore_memcpy = config->mprq_tstore_memcpy;
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   tmpl->rxq.vec_rx_wqe_field_ntstore = config->vec_rx_wqe_field_ntstore;
 #endif
mprq_stride_nums = config->mprq.stride_num_n ?
config->mprq.stride_num_n : MLX5_MPRQ_STRIDE_NUM_N;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index f59e30d82..76bf20b6f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -214,6 +214,301 @@ static void *memcpy_aligned_rx_tstore_16B(void *dst, void 
*src, int len)
 }
 #endif
 
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+static void *amd_memcpy(void *dest, const void *src, size_t size)
+{
+   asm goto (
+   "movq   %0, %%rsi\n\t"
+   "movq   %1, %%rdi\n\t"
+   "movq   %2, %%rdx\n\t"
+   "movq%%rdi, %%rax\n\t"
+   "cmp $32, %%rdx\n\t"
+   "jb  less_vec\n\t"
+   "cmp $(32 * 2), %%rdx\n\t"
+   "ja  more_2x_vec\n\t"
+   "vmovdqu   (%%rsi), %%ymm0\n\t"
+   "vmovdqu   -32(%%rsi,%%rdx), %%ymm1\n\t"
+   "vmovdqu   %%ymm0, (%%rdi)\n\t"
+   "vmovdqu   %%ymm1, -32(%%rdi,%%rdx)\n\t"
+   "vzeroupper\n\t"
+   "jmp %l[done]\n\t"
+   "less_vec:\n\t"
+   /* Less than 1 VEC.  */
+   "cmpb$32, %%dl\n\t"
+   "jae between_32_63\n\t"
+   "cmpb$16, %%dl\n\t"
+   "jae between_16_31\n\t"
+   "cmpb$8, %%dl\n\t"
+   "jae b

[dpdk-dev] [PATCH v3 1/2] net/mlx5: optimize mprq memcpy

2020-10-10 Thread Aman Kumar
add non temporal load and temporal store for mprq memcpy.
define RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY in build
configuration to enable this optimization.

Signed-off-by: Aman Kumar 
---
 drivers/net/mlx5/meson.build |   1 +
 drivers/net/mlx5/mlx5.c  |  12 
 drivers/net/mlx5/mlx5.h  |   3 +
 drivers/net/mlx5/mlx5_rxq.c  |   3 +
 drivers/net/mlx5/mlx5_rxtx.c | 116 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   3 +
 meson_options.txt|   2 +
 7 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 9a97bb9c8..38e93fdc1 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -47,6 +47,7 @@ foreach option:cflags_options
cflags += option
endif
 endforeach
+dpdk_conf.set('RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY', 
get_option('mlx5_ntload_tstore'))
 if get_option('buildtype').contains('debug')
cflags += [ '-pedantic', '-DPEDANTIC' ]
 else
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 01ead6e6a..a2796eaa5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -160,6 +160,11 @@
 /* Configure timeout of LRO session (in microseconds). */
 #define MLX5_LRO_TIMEOUT_USEC "lro_timeout_usec"
 
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+/* mprq_tstore_memcpy */
+#define MLX5_MPRQ_TSTORE_MEMCPY "mprq_tstore_memcpy"
+#endif
+
 /*
  * Device parameter to configure the total data buffer size for a single
  * hairpin queue (logarithm value).
@@ -1623,6 +1628,10 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
config->sys_mem_en = !!tmp;
} else if (strcmp(MLX5_DECAP_EN, key) == 0) {
config->decap_en = !!tmp;
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   } else if (strcmp(MLX5_MPRQ_TSTORE_MEMCPY, key) == 0) {
+   config->mprq_tstore_memcpy = tmp;
+#endif
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
rte_errno = EINVAL;
@@ -1683,6 +1692,9 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
MLX5_RECLAIM_MEM,
MLX5_SYS_MEM_EN,
MLX5_DECAP_EN,
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   MLX5_MPRQ_TSTORE_MEMCPY,
+#endif
NULL,
};
struct rte_kvargs *kvlist;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 43da9a1fb..1eb305650 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -234,6 +234,9 @@ struct mlx5_dev_config {
int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
struct mlx5_hca_attr hca_attr; /* HCA attributes. */
struct mlx5_lro_config lro; /* LRO configuration. */
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   unsigned int mprq_tstore_memcpy:1;
+#endif
 };
 
 
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c059e216d..c8db59a12 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1380,6 +1380,9 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
tmpl->socket = socket;
if (dev->data->dev_conf.intr_conf.rxq)
tmpl->irq = 1;
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   tmpl->rxq.mprq_tstore_memcpy = config->mprq_tstore_memcpy;
+#endif
mprq_stride_nums = config->mprq.stride_num_n ?
config->mprq.stride_num_n : MLX5_MPRQ_STRIDE_NUM_N;
mprq_stride_size = non_scatter_min_mbuf_size <=
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 0b87be15b..f59e30d82 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -123,6 +123,97 @@ uint8_t mlx5_swp_types_table[1 << 10] __rte_cache_aligned;
 uint64_t rte_net_mlx5_dynf_inline_mask;
 #define PKT_TX_DYNF_NOINLINE rte_net_mlx5_dynf_inline_mask
 
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+static void copy16B_ts(void *dst, void *src)
+{
+   __m128i var128;
+
+   var128 = _mm_stream_load_si128((__m128i *)src);
+   _mm_storeu_si128((__m128i *)dst, var128);
+}
+
+static void copy32B_ts(void *dst, void *src)
+{
+   __m256i ymm0;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
+}
+
+static void copy64B_ts(void *dst, void *src)
+{
+   __m256i ymm0, ymm1;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   ymm1 = _mm256_stream_load_si256((const __m256i *)((uint8_t *)src + 32));
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
+   _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 32), ymm1);
+}
+
+static void copy128B_ts(void *dst, void *src)
+{
+   __m256i ymm0, ymm1, ymm2, ymm3;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   ymm1 = _mm256_stream_load_si256((const __m256i *)((uint8_t *)src + 32));
+   

[dpdk-dev] [PATCH v3 2/2] net/mlx5: add non temporal store for WQE fields

2020-10-10 Thread Aman Kumar
add non temporal store for few WQE fields to optimize
data path. Define RTE_LIBRTE_MLX5_NT_STORE in build
configurations to enable this optimization.

Signed-off-by: Aman Kumar 
---
 drivers/net/mlx5/meson.build |   1 +
 drivers/net/mlx5/mlx5.c  |  17 ++
 drivers/net/mlx5/mlx5.h  |   4 +
 drivers/net/mlx5/mlx5_rxq.c  |   3 +
 drivers/net/mlx5/mlx5_rxtx.c | 322 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   6 +
 drivers/net/mlx5/mlx5_rxtx_vec.h |  29 ++-
 drivers/net/mlx5/mlx5_txq.c  |   3 +
 meson_options.txt|   2 +
 9 files changed, 378 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 38e93fdc1..347ca6527 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -48,6 +48,7 @@ foreach option:cflags_options
endif
 endforeach
 dpdk_conf.set('RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY', 
get_option('mlx5_ntload_tstore'))
+dpdk_conf.set('RTE_LIBRTE_MLX5_NT_STORE', get_option('mlx5_ntstore'))
 if get_option('buildtype').contains('debug')
cflags += [ '-pedantic', '-DPEDANTIC' ]
 else
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a2796eaa5..01b25a109 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -164,6 +164,13 @@
 /* mprq_tstore_memcpy */
 #define MLX5_MPRQ_TSTORE_MEMCPY "mprq_tstore_memcpy"
 #endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+/* tx_wqe_field_ntstore */
+#define MLX5_TX_WQE_FIELD_NTSTORE "tx_wqe_field_ntstore"
+
+/* vec_rx_wqe_field_ntstore */
+#define MLX5_VEC_RX_WQE_FIELD_NTSTORE "vec_rx_wqe_field_ntstore"
+#endif
 
 /*
  * Device parameter to configure the total data buffer size for a single
@@ -1631,6 +1638,12 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
} else if (strcmp(MLX5_MPRQ_TSTORE_MEMCPY, key) == 0) {
config->mprq_tstore_memcpy = tmp;
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   } else if (strcmp(MLX5_TX_WQE_FIELD_NTSTORE, key) == 0) {
+   config->tx_wqe_field_ntstore = tmp;
+   } else if (strcmp(MLX5_VEC_RX_WQE_FIELD_NTSTORE, key) == 0) {
+   config->vec_rx_wqe_field_ntstore = tmp;
 #endif
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
@@ -1694,6 +1707,10 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
MLX5_DECAP_EN,
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
MLX5_MPRQ_TSTORE_MEMCPY,
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   MLX5_TX_WQE_FIELD_NTSTORE,
+   MLX5_VEC_RX_WQE_FIELD_NTSTORE,
 #endif
NULL,
};
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1eb305650..9d192465f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -237,6 +237,10 @@ struct mlx5_dev_config {
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
unsigned int mprq_tstore_memcpy:1;
 #endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   unsigned int tx_wqe_field_ntstore:1;
+   unsigned int vec_rx_wqe_field_ntstore:1;
+#endif
 };
 
 
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c8db59a12..69ad9ab8c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1382,6 +1382,9 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
tmpl->irq = 1;
 #ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
tmpl->rxq.mprq_tstore_memcpy = config->mprq_tstore_memcpy;
+#endif
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+   tmpl->rxq.vec_rx_wqe_field_ntstore = config->vec_rx_wqe_field_ntstore;
 #endif
mprq_stride_nums = config->mprq.stride_num_n ?
config->mprq.stride_num_n : MLX5_MPRQ_STRIDE_NUM_N;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index f59e30d82..76bf20b6f 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -214,6 +214,301 @@ static void *memcpy_aligned_rx_tstore_16B(void *dst, void 
*src, int len)
 }
 #endif
 
+#ifdef RTE_LIBRTE_MLX5_NT_STORE
+static void *amd_memcpy(void *dest, const void *src, size_t size)
+{
+   asm goto (
+   "movq   %0, %%rsi\n\t"
+   "movq   %1, %%rdi\n\t"
+   "movq   %2, %%rdx\n\t"
+   "movq%%rdi, %%rax\n\t"
+   "cmp $32, %%rdx\n\t"
+   "jb  less_vec\n\t"
+   "cmp $(32 * 2), %%rdx\n\t"
+   "ja  more_2x_vec\n\t"
+   "vmovdqu   (%%rsi), %%ymm0\n\t"
+   "vmovdqu   -32(%%rsi,%%rdx), %%ymm1\n\t"
+   "vmovdqu   %%ymm0, (%%rdi)\n\t"
+   "vmovdqu   %%ymm1, -32(%%rdi,%%rdx)\n\t"
+   "vzeroupper\n\t"
+   "jmp %l[done]\n\t"
+   "less_vec:\n\t"
+   /* Less than 1 VEC.  */
+   "cmpb$32, %%dl\n\t"
+   "jae between_32_63\n\t"
+   "cmpb$16, %%dl\n\t"
+   "jae between_16_31\n\t"
+   "cmpb$8, %%dl\n\t"
+   "jae b

[dpdk-dev] [PATCH v3 1/2] net/mlx5: optimize mprq memcpy

2020-10-10 Thread Aman Kumar
add non temporal load and temporal store for mprq memcpy.
define RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY in build
configuration to enable this optimization.

Signed-off-by: Aman Kumar 
---
 drivers/net/mlx5/meson.build |   1 +
 drivers/net/mlx5/mlx5.c  |  12 
 drivers/net/mlx5/mlx5.h  |   3 +
 drivers/net/mlx5/mlx5_rxq.c  |   3 +
 drivers/net/mlx5/mlx5_rxtx.c | 116 ++-
 drivers/net/mlx5/mlx5_rxtx.h |   3 +
 meson_options.txt|   2 +
 7 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 9a97bb9c8..38e93fdc1 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -47,6 +47,7 @@ foreach option:cflags_options
cflags += option
endif
 endforeach
+dpdk_conf.set('RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY', 
get_option('mlx5_ntload_tstore'))
 if get_option('buildtype').contains('debug')
cflags += [ '-pedantic', '-DPEDANTIC' ]
 else
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 01ead6e6a..a2796eaa5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -160,6 +160,11 @@
 /* Configure timeout of LRO session (in microseconds). */
 #define MLX5_LRO_TIMEOUT_USEC "lro_timeout_usec"
 
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+/* mprq_tstore_memcpy */
+#define MLX5_MPRQ_TSTORE_MEMCPY "mprq_tstore_memcpy"
+#endif
+
 /*
  * Device parameter to configure the total data buffer size for a single
  * hairpin queue (logarithm value).
@@ -1623,6 +1628,10 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
config->sys_mem_en = !!tmp;
} else if (strcmp(MLX5_DECAP_EN, key) == 0) {
config->decap_en = !!tmp;
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   } else if (strcmp(MLX5_MPRQ_TSTORE_MEMCPY, key) == 0) {
+   config->mprq_tstore_memcpy = tmp;
+#endif
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
rte_errno = EINVAL;
@@ -1683,6 +1692,9 @@ mlx5_args(struct mlx5_dev_config *config, struct 
rte_devargs *devargs)
MLX5_RECLAIM_MEM,
MLX5_SYS_MEM_EN,
MLX5_DECAP_EN,
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   MLX5_MPRQ_TSTORE_MEMCPY,
+#endif
NULL,
};
struct rte_kvargs *kvlist;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 43da9a1fb..1eb305650 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -234,6 +234,9 @@ struct mlx5_dev_config {
int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
struct mlx5_hca_attr hca_attr; /* HCA attributes. */
struct mlx5_lro_config lro; /* LRO configuration. */
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   unsigned int mprq_tstore_memcpy:1;
+#endif
 };
 
 
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index c059e216d..c8db59a12 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1380,6 +1380,9 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
tmpl->socket = socket;
if (dev->data->dev_conf.intr_conf.rxq)
tmpl->irq = 1;
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+   tmpl->rxq.mprq_tstore_memcpy = config->mprq_tstore_memcpy;
+#endif
mprq_stride_nums = config->mprq.stride_num_n ?
config->mprq.stride_num_n : MLX5_MPRQ_STRIDE_NUM_N;
mprq_stride_size = non_scatter_min_mbuf_size <=
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 0b87be15b..f59e30d82 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -123,6 +123,97 @@ uint8_t mlx5_swp_types_table[1 << 10] __rte_cache_aligned;
 uint64_t rte_net_mlx5_dynf_inline_mask;
 #define PKT_TX_DYNF_NOINLINE rte_net_mlx5_dynf_inline_mask
 
+#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY
+static void copy16B_ts(void *dst, void *src)
+{
+   __m128i var128;
+
+   var128 = _mm_stream_load_si128((__m128i *)src);
+   _mm_storeu_si128((__m128i *)dst, var128);
+}
+
+static void copy32B_ts(void *dst, void *src)
+{
+   __m256i ymm0;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
+}
+
+static void copy64B_ts(void *dst, void *src)
+{
+   __m256i ymm0, ymm1;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   ymm1 = _mm256_stream_load_si256((const __m256i *)((uint8_t *)src + 32));
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
+   _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 32), ymm1);
+}
+
+static void copy128B_ts(void *dst, void *src)
+{
+   __m256i ymm0, ymm1, ymm2, ymm3;
+
+   ymm0 = _mm256_stream_load_si256((const __m256i *)src);
+   ymm1 = _mm256_stream_load_si256((const __m256i *)((uint8_t *)src + 32));
+   

Re: [dpdk-dev] [dpdk-dev v2 2/2] vhost/crypto: fix feature negotiation

2020-10-10 Thread Jiang, YuX
Tested-by: Jiang, YuX 

Best Regards
Jiang yu

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin
> Sent: Friday, October 9, 2020 3:36 PM
> To: Zhang, Roy Fan ; dev@dpdk.org
> Cc: Xia, Chenbo ; Liu, Changpeng
> ; Yigit, Ferruh ;
> sta...@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-dev v2 2/2] vhost/crypto: fix feature
> negotiation
> 
> 
> 
> On 10/6/20 10:37 AM, Zhang, Roy Fan wrote:
> > Hi Maxime,
> >
> > Thanks I will verify it after you applied the patch.
> 
> 
> Thanks,
> Your patch is now on dpdk-next-virtio/main.
> I would be glad if you could test it before Ferruh merges it.
> 
> Thanks,
> Maxime
> > Regards,
> > Fan
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Tuesday, October 6, 2020 9:10 AM
> >> To: Zhang, Roy Fan ; dev@dpdk.org
> >> Cc: Xia, Chenbo ; Liu, Changpeng
> >> ; Yigit, Ferruh ;
> >> sta...@dpdk.org
> >> Subject: Re: [dpdk-dev v2 2/2] vhost/crypto: fix feature negotiation
> >>
> >>
> >>
> >> On 10/2/20 5:36 PM, Fan Zhang wrote:
> >>> This patch fixes the feature negotiation for vhost crypto during
> >>> initialization. The patch uses the newly created driver start
> >>> function to inform the driver type with the fixed vhost features.
> >>> In addtion the patch provides a new API specifically used by the
> >>> application to start a vhost-crypto driver.
> >>>
> >>> Fixes: 939066d96563 ("vhost/crypto: add public function
> >>> implementation")
> >>> Cc: roy.fan.zh...@intel.com
> >>>
> >>> Signed-off-by: Fan Zhang 
> >>> ---
> >>>  examples/vhost_crypto/main.c   |  3 +-
> >>>  lib/librte_vhost/rte_vhost_crypto.h| 12 
> >>>  lib/librte_vhost/rte_vhost_version.map |  1 +
> >>>  lib/librte_vhost/vhost_crypto.c| 41 +-
> >>>  4 files changed, 42 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/examples/vhost_crypto/main.c
> >> b/examples/vhost_crypto/main.c
> >>> index d78fd9b81..11ad49159 100644
> >>> --- a/examples/vhost_crypto/main.c
> >>> +++ b/examples/vhost_crypto/main.c
> >>> @@ -598,7 +598,8 @@ main(int argc, char *argv[])
> >>>   rte_vhost_driver_callback_register(lo-
> >>> socket_files[j],
> >>>   &virtio_crypto_device_ops);
> >>>
> >>> - ret = rte_vhost_driver_start(lo->socket_files[j]);
> >>> + ret = rte_vhost_crypto_driver_start(
> >>> + lo->socket_files[j]);
> >>>   if (ret < 0)  {
> >>>   RTE_LOG(ERR, USER1, "failed to start
> >> vhost.\n");
> >>>   goto error_exit;
> >>> diff --git a/lib/librte_vhost/rte_vhost_crypto.h
> >> b/lib/librte_vhost/rte_vhost_crypto.h
> >>> index b54d61db6..c809c46a2 100644
> >>> --- a/lib/librte_vhost/rte_vhost_crypto.h
> >>> +++ b/lib/librte_vhost/rte_vhost_crypto.h
> >>> @@ -20,6 +20,18 @@ enum rte_vhost_crypto_zero_copy {
> >>>   RTE_VHOST_CRYPTO_MAX_ZERO_COPY_OPTIONS
> >>>  };
> >>>
> >>> +/**
> >>> + * Start vhost crypto driver
> >>> + *
> >>> + * @param path
> >>> + *  The vhost-user socket file path
> >>> + * @return
> >>> + *  0 on success, -1 on failure
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_vhost_crypto_driver_start(const char *path);
> >>> +
> >>>  /**
> >>>   *  Create Vhost-crypto instance
> >>>   *
> >>> diff --git a/lib/librte_vhost/rte_vhost_version.map
> >> b/lib/librte_vhost/rte_vhost_version.map
> >>> index 55e98e557..9183d6f2f 100644
> >>> --- a/lib/librte_vhost/rte_vhost_version.map
> >>> +++ b/lib/librte_vhost/rte_vhost_version.map
> >>> @@ -55,6 +55,7 @@ EXPERIMENTAL {
> >>>   rte_vhost_driver_get_protocol_features;
> >>>   rte_vhost_driver_get_queue_num;
> >>>   rte_vhost_crypto_create;
> >>> + rte_vhost_crypto_driver_start;
> >>>   rte_vhost_crypto_free;
> >>>   rte_vhost_crypto_fetch_requests;
> >>>   rte_vhost_crypto_finalize_requests;
> >>> diff --git a/lib/librte_vhost/vhost_crypto.c
> >> b/lib/librte_vhost/vhost_crypto.c
> >>> index e08f9c6d7..6195958d2 100644
> >>> --- a/lib/librte_vhost/vhost_crypto.c
> >>> +++ b/lib/librte_vhost/vhost_crypto.c
> >>> @@ -35,13 +35,12 @@
> >>>  #define VC_LOG_DBG(fmt, args...)
> >>>  #endif
> >>>
> >>> -#define VIRTIO_CRYPTO_FEATURES ((1 <<
> VIRTIO_F_NOTIFY_ON_EMPTY)
> >> |  \
> >>> - (1 << VIRTIO_RING_F_INDIRECT_DESC) |
> >>\
> >>> - (1 << VIRTIO_RING_F_EVENT_IDX) |\
> >>> - (1 << VIRTIO_CRYPTO_SERVICE_CIPHER) |
> >>\
> >>> - (1 << VIRTIO_CRYPTO_SERVICE_MAC) |
> >>\
> >>> - (1 << VIRTIO_NET_F_CTRL_VQ) |
> >>\
> >>> - (1 << VHOST_USER_PROTOCOL_F_CONFIG))
> >>> +#define VIRTIO_CRYPTO_FEATURES ((1ULL <<
> >> VIRTIO_F_NOTIFY_ON_EMPTY) |\
> >>> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
> >>\
> >>> + (1ULL << VIRTIO_RING_F_EVENT_IDX) |
> >>\
> >>> + (1ULL << VIRTIO_NET_F_CTRL_VQ) |\
> >>> + (1ULL << VIRTIO_F

Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection

2020-10-10 Thread Ruifeng Wang


> -Original Message-
> From: dev  On Behalf Of Mairtin o Loingsigh
> Sent: Friday, October 9, 2020 9:51 PM
> To: jasvinder.si...@intel.com; bruce.richard...@intel.com;
> pablo.de.lara.gua...@intel.com; konstantin.anan...@intel.com
> Cc: dev@dpdk.org; brendan.r...@intel.com; mairtin.oloings...@intel.com;
> david.co...@intel.com
> Subject: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific
> CRC selection
> 
> This patch adds support for run-time selection of the optimal architecture-
> specific CRC path, based on the supported instruction set(s) of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by the
> library at process initialization time to determine which instructions the CPU
> supports, with the most optimal supported CRC path ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh 
> Signed-off-by: David Coyle 
> Acked-by: Konstantin Ananyev 
> ---
>  doc/guides/rel_notes/release_20_11.rst|   4 +
>  lib/librte_net/meson.build|  34 ++-
>  lib/librte_net/net_crc.h  |  34 +++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 ++---
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 ++-
>  lib/librte_net/rte_net_crc.c  | 116 
> +++---
>  6 files changed, 168 insertions(+), 80 deletions(-)  create mode 100644
> lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h =>
> net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 808bdc4e5..b77297f7e 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
> 
> +* **Updated CRC modules of rte_net library.**
> +
> +  * Added run-time selection of the optimal architecture-specific CRC path.
> +
>  * **Updated Broadcom bnxt driver.**
> 
>Updated the Broadcom bnxt driver with new features and improvements,
> including:
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index
> 24ed8253b..fa439b9e5 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel
> Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
> 
>  headers = files('rte_ip.h',
>   'rte_tcp.h',
> @@ -20,3 +20,35 @@ headers = files('rte_ip.h',
> 
>  sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')  
> deps +=
> ['mbuf']
> +
> +if dpdk_conf.has('RTE_ARCH_X86_64')
> + net_crc_sse42_cpu_support = (
> + cc.get_define('__PCLMUL__', args: machine_args) != '')
> + net_crc_sse42_cc_support = (
> + cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +
> + build_static_net_crc_sse42_lib = 0
> +
> + if net_crc_sse42_cpu_support == true
> + sources += files('net_crc_sse.c')
> + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> + elif net_crc_sse42_cc_support == true
> + build_static_net_crc_sse42_lib = 1
> + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
> + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> + endif
> +
> + if build_static_net_crc_sse42_lib == 1
> + net_crc_sse42_lib = static_library(
> + 'net_crc_sse42_lib',
> + 'net_crc_sse.c',
> + dependencies: static_rte_eal,
> + c_args: [cflags,
> + net_crc_sse42_lib_cflags])
> + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
> + endif
> +elif (dpdk_conf.has('RTE_ARCH_ARM64') and
> + cc.get_define('__ARM_FEATURE_CRYPTO', args:
> machine_args) != '')
> + sources += files('net_crc_neon.c')
> + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] endif
> diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode
> 100644 index 0..a1578a56c
> --- /dev/null
> +++ b/lib/librte_net/net_crc.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#ifndef _NET_CRC_H_
> +#define _NET_CRC_H_
> +
> +/*
> + * Different implementations of CRC
> + */
> +
> +/* SSE4.2 */
> +
> +void
> +rte_net_crc_sse42_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +/* NEON */
> +
> +void
> +rte_net_crc_neon_init(void);
> +
> +

Re: [dpdk-dev] [PATCH v2 00/56] net: txgbe PMD

2020-10-10 Thread Jiawen Wu
On 10/9/2020 5:47 PM, Ferruh Yigit wrote:
> On 10/9/2020 4:03 AM, jiawe...@trustnetic.com wrote:
> > Hi Ferruh,
> >
> > For the syntax/style check issue, should I fix all the errors and warnings 
> > or
> just fix the errors?
> > It seems to be a lot of warnings.
> >
> 
> [Please don't top post, it makes archives un-readable]
> 
> Please fix all, but beware that there may be false positive in the checkpatch
> warnings, so you need to process the output first.
> This is a new PMD, if the syntax is not put correct at first place, very 
> unlikely
> that it will be fixed later, so lets try to fix them as much as possible.
> 
> For some drivers, the base code is shared in multiple platforms, like Linux,
> FreeBSD, Windows etc..., for them we are more flexible and we allow to keep
> the original syntax of that shared code, *as long as it is consistent within 
> itself*.
> Do you have similar case in the base folder files?
> 
> The code for the DPDK should follow the DPDK coding convention [1] and should
> have as less checkpatch warnings/errors as possible.
> 
> [1] https://doc.dpdk.org/guides/contributing/coding_style.html
> 
> Thanks,
> ferruh
> 
> 

There are some 'checks' show that macro argument reused will possible take 
side-effects.
Like this:

CHECK:MACRO_ARG_REUSE: Macro argument reuse 'y' - possible side-effects?
#56: FILE: drivers/net/txgbe/base/txgbe_regs.h:35:
+#define ROUND_UP(x, y)  (((x) + (y) - 1) / (y) * (y))

But the example given in the DPDK coding convention is:

#define MACRO(x, y) do {  \
variable = (x) + (y);   \
(y) += 2;\
} while(0)

It seems to reuse argument, too.
Should I fix this 'check', or treat it as a false positive?


> > -Original Message-
> > From: Ferruh Yigit 
> > Sent: Tuesday, October 6, 2020 7:03 PM
> > To: Jiawen Wu ; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 00/56] net: txgbe PMD
> >
> > On 10/5/2020 1:08 PM, Jiawen Wu wrote:
> >> v2: re-order patches and fix some known problems
> >> v1: introduce txgbe PMD
> >>
> >> jiawenwu (56):
> >> net/txgbe: add build and doc infrastructure
> >> net/txgbe: add ethdev probe and remove
> >> net/txgbe: add device init and uninit
> >> net/txgbe: add error types and registers
> >> net/txgbe: add mac type and bus lan id
> >> net/txgbe: add HW infrastructure and dummy function
> >> net/txgbe: add EEPROM functions
> >> net/txgbe: add HW init and reset operation
> >> net/txgbe: add PHY init
> >> net/txgbe: add module identify
> >> net/txgbe: add PHY reset
> >> net/txgbe: add info get operation
> >> net/txgbe: add interrupt operation
> >> net/txgbe: add device configure operation
> >> net/txgbe: add link status change
> >> net/txgbe: add multi-speed link setup
> >> net/txgbe: add autoc read and write
> >> net/txgbe: add MAC address operations
> >> net/txgbe: add unicast hash bitmap
> >> net/txgbe: add RX and TX init
> >> net/txgbe: add RX and TX queues setup and release
> >> net/txgbe: add RX and TX start and stop
> >> net/txgbe: add packet type
> >> net/txgbe: fill simple transmit function
> >> net/txgbe: fill transmit function with hardware offload
> >> net/txgbe: fill TX prepare funtion
> >> net/txgbe: fill receive functions
> >> net/txgbe: add device start operation
> >> net/txgbe: add RX and TX data path start and stop
> >> net/txgbe: add device stop and close operations
> >> net/txgbe: support RX interrupt
> >> net/txgbe: add RX and TX queue info get
> >> net/txgbe: add device stats get
> >> net/txgbe: add device xstats get
> >> net/txgbe: add queue stats mapping
> >> net/txgbe: add VLAN handle support
> >> net/txgbe: add SWFW semaphore and lock
> >> net/txgbe: add PF module init and uninit for SRIOV
> >> net/txgbe: add process mailbox operation
> >> net/txgbe: add PF module configure for SRIOV
> >> net/txgbe: add VMDq configure
> >> net/txgbe: add RSS support
> >> net/txgbe: add DCB support
> >> net/txgbe: add flow control support
> >> net/txgbe: add FC auto negotiation support
> >> net/txgbe: add priority flow control support
> >> net/txgbe: add device promiscuous and allmulticast mode
> >> net/txgbe: add MTU set operation
> >> net/txgbe: add FW version get operation
> >> net/txgbe: add EEPROM info get operation
> >> net/txgbe: add register dump support
> >> net/txgbe: support device LED on and off
> >> net/txgbe: add mirror rule operations
> >> net/txgbe: add PTP support
> >> net/txgbe: add DCB info get operation
> >> net/txgbe: add Rx and Tx descriptor status
> >>
> >
> > Hi Jiawen,
> >
> > Before going into more detailed reviews, the patchset conflicts with some
> recent changes in the main repo [1], can you please rebase on top of the 
> latest
> head of the repo?
> >
> > Also DPDK syntax/style check scripts are giving errors, can you

[dpdk-dev] [Bug 553] iavf_fdir/negative_case: run command with invalid port 'flow flush 2', it can't print error info or warning info.

2020-10-10 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=553

Bug ID: 553
   Summary: iavf_fdir/negative_case: run command with invalid port
'flow flush 2', it can't print error info or warning
info.
   Product: DPDK
   Version: 20.11
  Hardware: x86
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: testpmd
  Assignee: dev@dpdk.org
  Reporter: weix@intel.com
  Target Milestone: ---

Environment:
•DPDK version: 20.11.0-rc0  commit 0e995cbcfc81e1d86d92dfb871ebe14ee5b8d9e4
•OS: Ubuntu 20.04.1 LTS / 5.4.0-45-generic
•Compiler: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
•Hardware platform: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
•NIC hardware: Intel Corporation Ethernet Controller E810-C for QSFP
[8086:1592]
•NIC firmware: 2.10 0x8000433e 1.2789.0
•ICE driver: 1.1.4
•PKG: ice_comms-1.3.20.0.pkg


Steps to reproduce:
1. Generate 2 VFs from PF.
# echo 2 > /sys/bus/pci/devices/\:81\:00.0/sriov_numvfs
2. Bind 2 VFs to vfio-pci.
# dpdk-devbind.py -b vfio-pci :81:01.0 :81:01.1
3.Launch testpmd.
# x86_64-native-linuxapp-gcc/app/dpdk-testpmd  -l 32,33,34,35 -n 4 -w
:81:01.0 -w :81:01.1 -- -i --rxq=16 --txq=16
testpmd> start
testpmd> flow destroy 2 rule 0
testpmd> flow flush 2


Actual Result:
testpmd> flow destroy 2 rule 0
Invalid port 2
testpmd> flow flush 2
testpmd>
Print info is empty.


Expected Result:
testpmd> flow destroy 2 rule 0
Invalid port 2
testpmd> flow flush 2
port_flow_complain(): Caught PMD error type 1 (cause unspecified): No such
device: No such device


Regression:
Is this issue a regression: Y
First bad commit:
commit 2a449871a12dacef6a644254e42175e09a316617 (HEAD)
Author: Thomas Monjalon 
Date: Tue Sep 29 01:14:34 2020 +0200

app/testpmd: align behaviour of multi-port detach

A port can be closed in multiple situations:
 - close command calling close_port() -> rte_eth_dev_close()
 - exit calling close_port() -> rte_eth_dev_close()
 - hotplug calling close_port() -> rte_eth_dev_close()
 - hotplug calling detach_device() -> rte_dev_remove()
 - port detach command, detach_device() -> rte_dev_remove()
 - device detach command, detach_devargs() -> rte_eal_hotplug_remove()

The flow rules are flushed before each close.
 It was already done in close_port(), detach_devargs() and
 detach_port_device() which calls detach_device(),
 but not in detach_device(). As a consequence, it was missing for siblings
 of port detach command and unplugged device.
 The check before calling port_flow_flush() is moved inside the function.

The state of the port to close is checked to be stopped.
 As above, this check was missing in detach_device(),
 impacting the cases of a multi-port device unplugged or detached
 with the port detach command.

Signed-off-by: Thomas Monjalon 
 Reviewed-by: Ferruh Yigit 
 Acked-by: Stephen Hemminger 

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [dpdk-dev] [PATCH v2] eal: add new prefetch write variants

2020-10-10 Thread Ruifeng Wang


> -Original Message-
> From: dev  On Behalf Of Harry van Haaren
> Sent: Monday, September 14, 2020 11:10 PM
> To: dev@dpdk.org
> Cc: pbhagavat...@marvell.com; Harry van Haaren
> 
> Subject: [dpdk-dev] [PATCH v2] eal: add new prefetch write variants
> 
> This commit adds a new rte_prefetch0_write() variants, suggesting to the
> compiler to use a prefetch instruction with intention to write. As a compiler
> builtin, the compiler can choose based on compilation target what the best
> implementation for this instruction is.
> 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> v2:
> - Add L1, L2, and L3 variants as ARM64 uarch supports them (Pavan)
> 
> The integer constants passed to the builtin are not available as a #define
> value, and doing #defines just for this write variant does not seems a nice
> solution to me... particularly for those using IDEs where any #define value is
> auto-hinted for code-completion.
> ---
>  lib/librte_eal/include/generic/rte_prefetch.h | 49 +++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
> b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdfbad..3dfca77a74 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,53 @@ static inline void rte_prefetch2(const volatile void *p);
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
> 
> +/**
> + * Prefetch a cache line into all cache levels, with intention to
> +write. This
> + * prefetch variant hints to the CPU that the program is expecting to
> +write to
> + * the cache line being prefetched.
> + *
> + * @param p Address to prefetch
> + */
> +static inline void rte_prefetch0_write(const void *p) {
> + /* 1 indicates intention to write, 3 sets target cache level to L1. See
> +  * GCC docs where these integer constants are described in more
> detail:
> +  *  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> +  */
> + __builtin_prefetch(p, 1, 3);
> +}
> +
> +/**
> + * Prefetch a cache line into all cache levels, except the 0th, with
> +intention
> + * to write. This prefetch variant hints to the CPU that the program is
> + * expecting to write to the cache line being prefetched.
> + *
> + * @param p Address to prefetch
> + */
> +static inline void rte_prefetch1_write(const void *p) {
> + /* 1 indicates intention to write, 2 sets target cache level to L2. See
> +  * GCC docs where these integer constants are described in more
> detail:
> +  *  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> +  */
> + __builtin_prefetch(p, 1, 2);
> +}
> +
> +/**
> + * Prefetch a cache line into all cache levels, except the 0th and 1st,
> +with
> + * intention to write. This prefetch variant hints to the CPU that the
> +program
> + * is expecting to write to the cache line being prefetched.
> + *
> + * @param p Address to prefetch
> + */
> +static inline void rte_prefetch2_write(const void *p) {
> + /* 1 indicates intention to write, 1 sets target cache level to L3. See
> +  * GCC docs where these integer constants are described in more
> detail:
> +  *  https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> +  */
> + __builtin_prefetch(p, 1, 1);
> +}
> +
> +
>  #endif /* _RTE_PREFETCH_H_ */
> --
> 2.17.1

Reviewed-by: Ruifeng Wang 


Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection

2020-10-10 Thread Ananyev, Konstantin



Hi David,

> > > This patch adds support for run-time selection of the optimal
> > > architecture-specific CRC path, based on the supported instruction
> > > set(s) of the CPU.
> > >
> > > The compiler option checks have been moved from the C files to the
> > > meson script. The rte_cpu_get_flag_enabled function is called
> > > automatically by the library at process initialization time to
> > > determine which instructions the CPU supports, with the most optimal
> > > supported CRC path ultimately selected.
> > >
> > > Signed-off-by: Mairtin o Loingsigh 
> > > Signed-off-by: David Coyle 
> >
> > LGTM, just one nit see below.
> > With that:
> > Series acked-by: Konstantin Ananyev 
> >
> > > ---
> > >  doc/guides/rel_notes/release_20_11.rst|  4 ++
> > >  lib/librte_net/meson.build| 34 +++-
> > >  lib/librte_net/net_crc.h  | 34 
> > >  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++--
> > >  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 
> > >  lib/librte_net/rte_net_crc.c  | 67 
> > > ++-
> > >  6 files changed, 131 insertions(+), 68 deletions(-)  create mode
> > > 100644 lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h
> > > => net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> > > net_crc_sse.c} (94%)
> > >
> > >
> 
> 
> 
> > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t
> > > +sse42_pclmulqdq_cpu_supported(void)
> > > +{
> > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > > +}
> >
> > As a nit, I think it would be better to hide #fidef inside the function, and
> > return an 0 when define is not set.
> > Something like:
> >
> > static int
> > sse42_pclmulqdq_cpu_supported(void)
> > {
> > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > #else
> > return 0;
> > }
> >
> > Same for other cpu_supported functions.
> > And then you can remove these ifdefs in set_alg and other palces, i.e.:
> >
> > void
> > rte_net_crc_set_alg(enum rte_net_crc_alg alg) {
> > switch (alg) {
> > #ifdef RTE_ARCH_X86_64
> > case RTE_NET_CRC_AVX512:
> > if (avx512_vpclmulqdq_cpu_supported()) {
> > handlers = handlers_avx512;
> > break;
> > }
> > /* fall-through */
> > case RTE_NET_CRC_SSE42:
> > if (sse42_pclmulqdq_cpu_supported()) {
> > handlers = handlers_sse42;
> > break;
> > }
> > #endif
> > ...
> >
> > Same for rte_net_crc_init()
> 
> [DC] I have reworked the ifdefs in this file based on your comments here and 
> off-list discussions.
> These are available now in the v5.
> 
> All ifdef's have been removed out the API function definitions and moved down 
> into 'helper' type
> functions - looks much cleaner now.
>
> Your Ack has been carried through too to v5 as you mentioned

LGTM, thanks.
Konstantin

 


Re: [dpdk-dev] [PATCH v3 11/18] net/ixgbe: add checks for max SIMD bitwidth

2020-10-10 Thread Wang, Haiyue
Hi Ciara,

> -Original Message-
> From: Power, Ciara 
> Sent: Wednesday, September 30, 2020 21:04
> To: dev@dpdk.org
> Cc: Power, Ciara ; Zhao1, Wei ; 
> Guo, Jia
> ; Wang, Haiyue 
> Subject: [PATCH v3 11/18] net/ixgbe: add checks for max SIMD bitwidth
> 
> When choosing a vector path to take, an extra condition must be
> satisfied to ensure the max SIMD bitwidth allows for the CPU enabled
> path.
> 
> Cc: Wei Zhao 
> Cc: Jeff Guo 
> 
> Signed-off-by: Ciara Power 
> ---
>  drivers/net/ixgbe/ixgbe_rxtx.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 977ecf5137..eadc7183f2 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -2503,7 +2503,9 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct 
> ixgbe_tx_queue *txq)
>   dev->tx_pkt_prepare = NULL;
>   if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
>   (rte_eal_process_type() != RTE_PROC_PRIMARY ||
> - ixgbe_txq_vec_setup(txq) == 0)) {
> + ixgbe_txq_vec_setup(txq) == 0) &&
> + rte_get_max_simd_bitwidth()

As Konstantin mentioned: " I think it is a bit safer to do all checks first 
before
 doing txq_vec_setup()."

Fox x86 & arm platforms, the setup is always 0, since 'sw_ring_v' is union with
'sw_ring' which is initialize at 'ixgbe_dev_tx_queue_setup'.

union {
struct ixgbe_tx_entry *sw_ring; /**< address of SW ring for 
scalar PMD. */
struct ixgbe_tx_entry_v *sw_ring_v; /**< address of SW ring for 
vector PMD */
};

static inline int
ixgbe_txq_vec_setup_default(struct ixgbe_tx_queue *txq,
const struct ixgbe_txq_ops *txq_ops)
{
if (txq->sw_ring_v == NULL)
return -1;

/* leave the first one for overflow */
txq->sw_ring_v = txq->sw_ring_v + 1;
txq->ops = txq_ops;

return 0;
}

So we need check the SIMD bitwidth firstly to avoid changing the sw_ring* 
pointer address.


Also, looks like we need to add check on:

int
ixgbe_dev_tx_done_cleanup(void *tx_queue, uint32_t free_cnt)
{
struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)tx_queue;
if (txq->offloads == 0 &&
#ifdef RTE_LIBRTE_SECURITY
!(txq->using_ipsec) &&
#endif
txq->tx_rs_thresh >= RTE_PMD_IXGBE_TX_MAX_BURST) {
if (txq->tx_rs_thresh <= RTE_IXGBE_TX_MAX_FREE_BUF_SZ &&
 <--- Add 
the same check
(rte_eal_process_type() != RTE_PROC_PRIMARY ||
txq->sw_ring_v != NULL)) {
return ixgbe_tx_done_cleanup_vec(txq, free_cnt);
} else {
return ixgbe_tx_done_cleanup_simple(txq, free_cnt);
}
}

> + >= RTE_MAX_128_SIMD) {
>   PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
>   dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;
>   } else
> @@ -4743,7 +4745,8 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
>* conditions to be met and Rx Bulk Allocation should be allowed.
>*/
>   if (ixgbe_rx_vec_dev_conf_condition_check(dev) ||
> - !adapter->rx_bulk_alloc_allowed) {
> + !adapter->rx_bulk_alloc_allowed ||
> + rte_get_max_simd_bitwidth() < RTE_MAX_128_SIMD) {
>   PMD_INIT_LOG(DEBUG, "Port[%d] doesn't meet Vector Rx "
>   "preconditions",
>dev->data->port_id);
> --
> 2.17.1



Re: [dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics

2020-10-10 Thread Ananyev, Konstantin


> >> Add two new power management intrinsics, and provide an implementation
> >> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions
> >> are implemented as raw byte opcodes because there is not yet widespread
> >> compiler support for these instructions.
> >>
> >> The power management instructions provide an architecture-specific
> >> function to either wait until a specified TSC timestamp is reached, or
> >> optionally wait until either a TSC timestamp is reached or a memory
> >> location is written to. The monitor function also provides an optional
> >> comparison, to avoid sleeping when the expected write has already
> >> happened, and no more writes are expected.
> >
> > I think what this API is missing - a function to wakeup sleeping core.
> > If user can/should use some system call to achieve that, then at least
> > it has to be clearly documented, even better some wrapper provided.
> 
>  I don't think it's possible to do that without severely overcomplicating
>  the intrinsic and its usage, because AFAIK the only way to wake up a
>  sleeping core would be to send some kind of interrupt to the core, or
>  trigger a write to the cache-line in question.
> 
> >>>
> >>> Yes, I think we either need a syscall that would do an IPI for us
> >>> (on top of my head - membarrier() does that, might be there are some 
> >>> other syscalls too),
> >>> or something hand-made. For hand-made, I wonder would something like that
> >>> be safe and sufficient:
> >>> uint64_t val = atomic_load(addr);
> >>> CAS(addr, val, &val);
> >>> ?
> >>> Anyway, one way or another - I think ability to wakeup core we put to 
> >>> sleep
> >>> have to be an essential part of this feature.
> >>> As I understand linux kernel will limit max amount of sleep time for 
> >>> these instructions:
> >>> https://lwn.net/Articles/790920/
> >>> But relying just on that, seems too vague for me:
> >>> - user can adjust that value
> >>> - wouldn't apply to older kernels and non-linux cases
> >>> Konstantin
> >>>
> >>
> >> This implies knowing the value the core is sleeping on.
> >
> > You don't the value to wait for, you just need an address.
> > And you can make wakeup function to accept address as a parameter,
> > same as monitor() does.
> 
> Sorry, i meant the address. We don't know the address we're sleeping on.
> 
> >
> >> That's not
> >> always the case - with this particular PMD power management scheme, we
> >> get the address from the PMD and it stays inside the callback.
> >
> > That's fine - you can store address inside you callback metadata
> > and do wakeup as part of _disable_ function.
> >
> 
> The address may be different, and by the time we access the address it
> may become stale, so i don't see how that would help unless you're
> suggesting to have some kind of synchronization mechanism there.

Yes, we'll need something to sync here for sure.
Sorry, I should say it straightway, to avoid further misunderstanding.
Let say, associate a spin_lock with monitor(), by analogy with 
pthread_cond_wait().  
Konstantin


[dpdk-dev] [PATCH] net/bonding: LACP Packet statistics support

2020-10-10 Thread Kiran KN
net/bonding: LACP Packet statistics support

Store the LACP packets sent and received for each slave.
This can be used for debug purposes from any DPDK application.

Signed-Off-By: Kiran K N 

Change-Id: Iae82bd7d0879a4c4333a292c96d431798c56e301
---
 drivers/net/bonding/eth_bond_8023ad_private.h |  2 ++
 drivers/net/bonding/rte_eth_bond_8023ad.c | 39 +++
 drivers/net/bonding/rte_eth_bond_8023ad.h | 20 ++
 3 files changed, 61 insertions(+)

diff --git a/drivers/net/bonding/eth_bond_8023ad_private.h 
b/drivers/net/bonding/eth_bond_8023ad_private.h
index ef0b56850..500640b28 100644
--- a/drivers/net/bonding/eth_bond_8023ad_private.h
+++ b/drivers/net/bonding/eth_bond_8023ad_private.h
@@ -19,6 +19,8 @@
 #define BOND_MODE_8023AX_SLAVE_RX_PKTS3
 /** Maximum number of LACP packets from one slave queued in TX ring. */
 #define BOND_MODE_8023AX_SLAVE_TX_PKTS1
+/** maximum number of slaves for each port */
+#define BOND_MODE_8023AD_MAX_SLAVES   6
 /**
  * Timeouts deffinitions (5.4.4 in 802.1AX documentation).
  */
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
index ea79a1344..37eb29847 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -132,6 +132,9 @@ static const struct rte_ether_addr lacp_mac_addr = {

 struct port bond_mode_8023ad_ports[RTE_MAX_ETHPORTS];

+static uint64_t lacpdu_tx_count[BOND_MODE_8023AD_MAX_SLAVES];
+static uint64_t lacpdu_rx_count[BOND_MODE_8023AD_MAX_SLAVES];
+
 static void
 timer_cancel(uint64_t *timer)
 {
@@ -629,6 +632,7 @@ tx_machine(struct bond_dev_private *internals, uint16_t 
slave_id)
 set_warning_flags(port, WRN_TX_QUEUE_FULL);
 return;
 }
+lacpdu_tx_count[slave_id]++;
 } else {
 uint16_t pkts_sent = rte_eth_tx_burst(slave_id,
 internals->mode4.dedicated_queues.tx_qid,
@@ -638,6 +642,7 @@ tx_machine(struct bond_dev_private *internals, uint16_t 
slave_id)
 set_warning_flags(port, WRN_TX_QUEUE_FULL);
 return;
 }
+lacpdu_tx_count[slave_id] += pkts_sent;
 }


@@ -896,6 +901,10 @@ bond_mode_8023ad_periodic_cb(void *arg)
 lacp_pkt = NULL;

 rx_machine_update(internals, slave_id, lacp_pkt);
+
+if (retval == 0) {
+lacpdu_rx_count[slave_id]++;
+}
 } else {
 uint16_t rx_count = rte_eth_rx_burst(slave_id,
 internals->mode4.dedicated_queues.rx_qid,
@@ -906,6 +915,8 @@ bond_mode_8023ad_periodic_cb(void *arg)
 slave_id, lacp_pkt);
 else
 rx_machine_update(internals, slave_id, NULL);
+
+lacpdu_rx_count[slave_id] += rx_count;
 }

 periodic_machine(internals, slave_id);
@@ -1715,3 +1726,31 @@ rte_eth_bond_8023ad_dedicated_queues_disable(uint16_t 
port)

 return retval;
 }
+
+uint64_t
+rte_eth_bond_8023ad_lacp_tx_count(uint16_t port_id, uint8_t clear)
+{
+if(port_id > BOND_MODE_8023AD_MAX_SLAVES)
+return -1;
+
+if(clear) {
+lacpdu_tx_count[port_id] = 0;
+return 0;
+}
+
+ return lacpdu_tx_count[port_id];
+}
+
+uint64_t
+rte_eth_bond_8023ad_lacp_rx_count(uint16_t port_id, uint8_t clear)
+{
+if(port_id > BOND_MODE_8023AD_MAX_SLAVES)
+return -1;
+
+if(clear) {
+lacpdu_rx_count[port_id] = 0;
+return 0;
+}
+
+return lacpdu_rx_count[port_id];
+}
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h 
b/drivers/net/bonding/rte_eth_bond_8023ad.h
index 5623e1424..7163de381 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.h
@@ -340,4 +340,24 @@ rte_eth_bond_8023ad_agg_selection_set(uint16_t port_id,
  */
 int
 rte_eth_bond_8023ad_ext_set_fast(uint16_t port_id, uint16_t slave_id);
+
+/**
+ *  Get Lacp statistics counter for slaves
+ *  @param port_id Bonding slave device id
+ *  @param clear, reset statistics
+ *  @return
+ *0 on success, negative value otherwise
+ */
+uint64_t
+rte_eth_bond_8023ad_lacp_tx_count(uint16_t port_id, uint8_t clear);
+
+/**
+ *  Get Lacp statistics counter for slaves
+ *  @param port_id Bonding slave device id
+ *  @param clear, reset statistics
+ *  @return
+ *0 on success, negative value otherwise
+ */
+uint64_t
+rte_eth_bond_8023ad_lacp_rx_count(uint16_t port_id, uint8_t clear);
 #endif /* RTE_ETH_BOND_8023AD_H_ */
--
2.16.6



Juniper Business Use Only


Re: [dpdk-dev] [PATCH] ethdev: check if queue setupped in queue-related APIs

2020-10-10 Thread Stephen Hemminger
On Sat, 10 Oct 2020 15:12:12 +0800
"Wei Hu (Xavier)"  wrote:

> + if (dev->data->rx_queues[rx_queue_id] == NULL) {
> + RTE_ETHDEV_LOG(ERR, "Rx queue %"PRIu16" of device with 
> port_id=%"
> + PRIu16" has not been setupped\n",
> + rx_queue_id, port_id);
> + return -EINVAL;
> +

Please use correct spelling.

Your change follows the existing style in rte_eth_dev_rx_queue_start() but
my preference is that message strings are not split across
lines. That makes it easier to use tools like grep to find messages in the 
source.

Use of PRIu16 is not required. And recent compiler standards would require space
around its use.

Suggest:
RTE_ETHDEV_LOG(ERR, 
   "Queue %u of device with port_id=%u has not been 
setup\n",
rx_queue_id, port_id);


[dpdk-dev] [PATCH v7 01/16] distributor: fix missing handshake synchronization

2020-10-10 Thread Lukasz Wojciechowski
rte_distributor_return_pkt function which is run on worker cores
must wait for distributor core to clear handshake on retptr64
before using those buffers. While the handshake is set distributor
core controls buffers and any operations on worker side might overwrite
buffers which are unread yet.
Same situation appears in the legacy single distributor. Function
rte_distributor_return_pkt_single shouldn't modify the bufptr64 until
handshake on it is cleared by distributor lcore.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c| 14 ++
 lib/librte_distributor/rte_distributor_single.c |  4 
 2 files changed, 18 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 1c047f065..89493c331 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -160,6 +160,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 {
struct rte_distributor_buffer *buf = &d->bufs[worker_id];
unsigned int i;
+   volatile int64_t *retptr64;
 
if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
if (num == 1)
@@ -169,6 +170,19 @@ rte_distributor_return_pkt(struct rte_distributor *d,
return -EINVAL;
}
 
+   retptr64 = &(buf->retptr64[0]);
+   /* Spin while handshake bits are set (scheduler clears it).
+* Sync with worker on GET_BUF flag.
+*/
+   while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
+   & RTE_DISTRIB_GET_BUF)) {
+   rte_pause();
+   uint64_t t = rte_rdtsc()+100;
+
+   while (rte_rdtsc() < t)
+   rte_pause();
+   }
+
/* Sync with distributor to acquire retptrs */
__atomic_thread_fence(__ATOMIC_ACQUIRE);
for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
diff --git a/lib/librte_distributor/rte_distributor_single.c 
b/lib/librte_distributor/rte_distributor_single.c
index abaf7730c..f4725b1d0 100644
--- a/lib/librte_distributor/rte_distributor_single.c
+++ b/lib/librte_distributor/rte_distributor_single.c
@@ -74,6 +74,10 @@ rte_distributor_return_pkt_single(struct 
rte_distributor_single *d,
union rte_distributor_buffer_single *buf = &d->bufs[worker_id];
uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
| RTE_DISTRIB_RETURN_BUF;
+   while (unlikely(__atomic_load_n(&buf->bufptr64, __ATOMIC_RELAXED)
+   & RTE_DISTRIB_FLAGS_MASK))
+   rte_pause();
+
/* Sync with distributor on RETURN_BUF flag. */
__atomic_store_n(&(buf->bufptr64), req, __ATOMIC_RELEASE);
return 0;
-- 
2.17.1



[dpdk-dev] [PATCH v7 00/16] fix distributor synchronization issues

2020-10-10 Thread Lukasz Wojciechowski
During review and verification of the patch created by Sarosh Arif:
"test_distributor: prevent memory leakages from the pool" I found out
that running distributor unit tests multiple times in a row causes fails.
So I investigated all the issues I found.

There are few synchronization issues that might cause deadlocks
or corrupted data. They are fixed with this set of patches for both tests
and librte_distributor library.

---
v7:
* add patch 16 ensuring that tests will try sending packets until workers
are started and requested for packets

v6:
* fix comments indentation
* fix stats atomic operations memory mode from ACQUIRE/RELEASE
to RELAXED

v5:
* implement missing functionality in burst mode - worker shutdown
* fix shutdown test to always shutdown busy worker
* use atomic stores instead of barrier in tests clear_packet_count()
* reorder patches
* new patch 7: fix call to return_pkt in single mode
* new patch 11: replacing delays with spinlock on atomics in tests
* new patch 12: fix scalar matching algorithm
* new patch 13: new test with marking and checking every packet
* new patch 14: flush also in flight packets
* new patch 15: fix clearing returns buffer
* minor fixes in other patches

v4:
* adjust commit name prefixes app/test -> test/distributor:
* reorder patches
* use NULL oldpkt in rte_distributor_get_pkt() calls in tests

v3:
* add missing acked and tested by statements from v1

v2:
* assign NULL to freed mbufs in distributor test
* fix handshake check on legacy single distributor
 rte_distributor_return_pkt_single()
* add patch 7 passing NULL to legacy API calls if no bufs are returned
* add patch 8 fixing API documentation


Lukasz Wojciechowski (16):
  distributor: fix missing handshake synchronization
  distributor: fix handshake deadlock
  distributor: do not use oldpkt when not needed
  distributor: handle worker shutdown in burst mode
  test/distributor: fix shutdown of busy worker
  test/distributor: synchronize lcores statistics
  distributor: fix return pkt calls in single mode
  test/distributor: fix freeing mbufs
  test/distributor: collect return mbufs
  distributor: align API documentation with code
  test/distributor: replace delays with spin locks
  distributor: fix scalar matching
  test/distributor: add test with packets marking
  distributor: fix flushing in flight packets
  distributor: fix clearing returns buffer
  test/distributor: ensure all packets are delivered

 app/test/test_distributor.c   | 347 ++
 lib/librte_distributor/distributor_private.h  |   3 +
 lib/librte_distributor/rte_distributor.c  | 219 ---
 lib/librte_distributor/rte_distributor.h  |  23 +-
 .../rte_distributor_single.c  |   4 +
 5 files changed, 471 insertions(+), 125 deletions(-)

-- 
2.17.1



[dpdk-dev] [PATCH v7 02/16] distributor: fix handshake deadlock

2020-10-10 Thread Lukasz Wojciechowski
Synchronization of data exchange between distributor and worker cores
is based on 2 handshakes: retptr64 for returning mbufs from workers
to distributor and bufptr64 for passing mbufs to workers.

Without proper order of verifying those 2 handshakes a deadlock may
occur. This can happen when worker core wants to return back mbufs
and waits for retptr handshake to be cleared while distributor core
waits for bufptr to send mbufs to worker.

This can happen as worker core first returns mbufs to distributor
and later gets new mbufs, while distributor first releases mbufs
to worker and later handle returning packets.

This patch fixes possibility of the deadlock by always taking care
of returning packets first on the distributor side and handling
packets while waiting to release new.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 89493c331..12b3db33c 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -321,12 +321,14 @@ release(struct rte_distributor *d, unsigned int wkr)
struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
unsigned int i;
 
+   handle_returns(d, wkr);
+
/* Sync with worker on GET_BUF flag */
while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE)
-   & RTE_DISTRIB_GET_BUF))
+   & RTE_DISTRIB_GET_BUF)) {
+   handle_returns(d, wkr);
rte_pause();
-
-   handle_returns(d, wkr);
+   }
 
buf->count = 0;
 
@@ -376,6 +378,7 @@ rte_distributor_process(struct rte_distributor *d,
/* Flush out all non-full cache-lines to workers. */
for (wid = 0 ; wid < d->num_workers; wid++) {
/* Sync with worker on GET_BUF flag. */
+   handle_returns(d, wid);
if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) {
release(d, wid);
-- 
2.17.1



[dpdk-dev] [PATCH v7 03/16] distributor: do not use oldpkt when not needed

2020-10-10 Thread Lukasz Wojciechowski
rte_distributor_request_pkt and rte_distributor_get_pkt dereferenced
oldpkt parameter when in RTE_DIST_ALG_SINGLE even if number
of returned buffers from worker to distributor was 0.

This patch passes NULL to the legacy API when number of returned
buffers is 0. This allows passing NULL as oldpkt parameter.

Distributor tests are also updated passing NULL as oldpkt and
0 as number of returned packets, where packets are not returned.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c  | 28 +---
 lib/librte_distributor/rte_distributor.c |  4 ++--
 2 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index ba1f81cf8..52230d250 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -62,13 +62,10 @@ handle_work(void *arg)
struct rte_mbuf *buf[8] __rte_cache_aligned;
struct worker_params *wp = arg;
struct rte_distributor *db = wp->dist;
-   unsigned int count = 0, num = 0;
+   unsigned int count = 0, num;
unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
-   int i;
 
-   for (i = 0; i < 8; i++)
-   buf[i] = NULL;
-   num = rte_distributor_get_pkt(db, id, buf, buf, num);
+   num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
while (!quit) {
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
__ATOMIC_RELAXED);
@@ -272,19 +269,16 @@ handle_work_with_free_mbufs(void *arg)
struct rte_distributor *d = wp->dist;
unsigned int count = 0;
unsigned int i;
-   unsigned int num = 0;
+   unsigned int num;
unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 
-   for (i = 0; i < 8; i++)
-   buf[i] = NULL;
-   num = rte_distributor_get_pkt(d, id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
while (!quit) {
worker_stats[id].handled_packets += num;
count += num;
for (i = 0; i < num; i++)
rte_pktmbuf_free(buf[i]);
-   num = rte_distributor_get_pkt(d,
-   id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
}
worker_stats[id].handled_packets += num;
count += num;
@@ -342,14 +336,14 @@ handle_work_for_shutdown_test(void *arg)
struct worker_params *wp = arg;
struct rte_distributor *d = wp->dist;
unsigned int count = 0;
-   unsigned int num = 0;
+   unsigned int num;
unsigned int total = 0;
unsigned int i;
unsigned int returned = 0;
const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
__ATOMIC_RELAXED);
 
-   num = rte_distributor_get_pkt(d, id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
 
/* wait for quit single globally, or for worker zero, wait
 * for zero_quit */
@@ -358,8 +352,7 @@ handle_work_for_shutdown_test(void *arg)
count += num;
for (i = 0; i < num; i++)
rte_pktmbuf_free(buf[i]);
-   num = rte_distributor_get_pkt(d,
-   id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
total += num;
}
worker_stats[id].handled_packets += num;
@@ -373,14 +366,13 @@ handle_work_for_shutdown_test(void *arg)
while (zero_quit)
usleep(100);
 
-   num = rte_distributor_get_pkt(d,
-   id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
 
while (!quit) {
worker_stats[id].handled_packets += num;
count += num;
rte_pktmbuf_free(pkt);
-   num = rte_distributor_get_pkt(d, id, buf, buf, num);
+   num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
}
returned = rte_distributor_return_pkt(d,
id, buf, num);
diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 12b3db33c..b720abe03 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -42,7 +42,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 
if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
rte_distributor_request_pkt_single(d->d_single,
-   worker_id, oldpkt[0]);
+   worker_id, count ? oldpkt[0] : NULL);
  

[dpdk-dev] [PATCH v7 05/16] test/distributor: fix shutdown of busy worker

2020-10-10 Thread Lukasz Wojciechowski
The sanity test with worker shutdown delegates all bufs
to be processed by a single lcore worker, then it freezes
one of the lcore workers and continues to send more bufs.
The freezed core shuts down first by calling
rte_distributor_return_pkt().

The test intention is to verify if packets assigned to
the shut down lcore will be reassigned to another worker.

However the shutdown core was not always the one, that was
processing packets. The lcore processing mbufs might be different
every time test is launched. This is caused by keeping the value
of wkr static variable in rte_distributor_process() function
between running test cases.

Test freezed always lcore with 0 id. The patch stores the id
of worker that is processing the data in zero_idx global atomic
variable. This way the freezed lcore is always the proper one.

Fixes: c3eabff124e6 ("distributor: add unit tests")
Cc: bruce.richard...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Tested-by: David Hunt 
---
 app/test/test_distributor.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 52230d250..6cd7a2edd 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -28,6 +28,7 @@ struct worker_params worker_params;
 static volatile int quit;  /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
 static volatile unsigned worker_idx;
+static volatile unsigned zero_idx;
 
 struct worker_stats {
volatile unsigned handled_packets;
@@ -340,26 +341,43 @@ handle_work_for_shutdown_test(void *arg)
unsigned int total = 0;
unsigned int i;
unsigned int returned = 0;
+   unsigned int zero_id = 0;
+   unsigned int zero_unset;
const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
__ATOMIC_RELAXED);
 
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
 
+   if (num > 0) {
+   zero_unset = RTE_MAX_LCORE;
+   __atomic_compare_exchange_n(&zero_idx, &zero_unset, id,
+   false, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE);
+   }
+   zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
+
/* wait for quit single globally, or for worker zero, wait
 * for zero_quit */
-   while (!quit && !(id == 0 && zero_quit)) {
+   while (!quit && !(id == zero_id && zero_quit)) {
worker_stats[id].handled_packets += num;
count += num;
for (i = 0; i < num; i++)
rte_pktmbuf_free(buf[i]);
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
+
+   if (num > 0) {
+   zero_unset = RTE_MAX_LCORE;
+   __atomic_compare_exchange_n(&zero_idx, &zero_unset, id,
+   false, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE);
+   }
+   zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
+
total += num;
}
worker_stats[id].handled_packets += num;
count += num;
returned = rte_distributor_return_pkt(d, id, buf, num);
 
-   if (id == 0) {
+   if (id == zero_id) {
/* for worker zero, allow it to restart to pick up last packet
 * when all workers are shutting down.
 */
@@ -578,6 +596,7 @@ quit_workers(struct worker_params *wp, struct rte_mempool 
*p)
rte_eal_mp_wait_lcore();
quit = 0;
worker_idx = 0;
+   zero_idx = RTE_MAX_LCORE;
 }
 
 static int
-- 
2.17.1



[dpdk-dev] [PATCH v7 04/16] distributor: handle worker shutdown in burst mode

2020-10-10 Thread Lukasz Wojciechowski
The burst version of distributor implementation was missing proper
handling of worker shutdown. A worker processing packets received
from distributor can call rte_distributor_return_pkt() function
informing distributor that it want no more packets. Further calls to
rte_distributor_request_pkt() or rte_distributor_get_pkt() however
should inform distributor that new packets are requested again.

Lack of the proper implementation has caused that even after worker
informed about returning last packets, new packets were still sent
from distributor causing deadlocks as no one could get them on worker
side.

This patch adds handling shutdown of the worker in following way:
1) It fixes usage of RTE_DISTRIB_VALID_BUF handshake flag. This flag
was formerly unused in burst implementation and now it is used
for marking valid packets in retptr64 replacing invalid use
of RTE_DISTRIB_RETURN_BUF flag.
2) Uses RTE_DISTRIB_RETURN_BUF as a worker to distributor handshake
in retptr64 to indicate that worker has shutdown.
3) Worker that shuts down blocks also bufptr for itself with
RTE_DISTRIB_RETURN_BUF flag allowing distributor to retrieve any
in flight packets.
4) When distributor receives information about shutdown of a worker,
it: marks worker as not active; retrieves any in flight and backlog
packets and process them to different workers; unlocks bufptr64
by clearing RTE_DISTRIB_RETURN_BUF flag and allowing use in
the future if worker requests any new packages.
5) Do not allow to: send or add to backlog any packets for not
active workers. Such workers are also ignored if matched.
6) Adjust calls to handle_returns() and tags matching procedure
to react for possible activation deactivation of workers.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/distributor_private.h |   3 +
 lib/librte_distributor/rte_distributor.c | 175 +++
 2 files changed, 146 insertions(+), 32 deletions(-)

diff --git a/lib/librte_distributor/distributor_private.h 
b/lib/librte_distributor/distributor_private.h
index 489aef2ac..689fe3e18 100644
--- a/lib/librte_distributor/distributor_private.h
+++ b/lib/librte_distributor/distributor_private.h
@@ -155,6 +155,9 @@ struct rte_distributor {
enum rte_distributor_match_function dist_match_fn;
 
struct rte_distributor_single *d_single;
+
+   uint8_t active[RTE_DISTRIB_MAX_WORKERS];
+   uint8_t activesum;
 };
 
 void
diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index b720abe03..115443fc0 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -51,7 +51,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 * Sync with worker on GET_BUF flag.
 */
while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
-   & RTE_DISTRIB_GET_BUF)) {
+   & (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
rte_pause();
uint64_t t = rte_rdtsc()+100;
 
@@ -67,11 +67,11 @@ rte_distributor_request_pkt(struct rte_distributor *d,
for (i = count; i < RTE_DIST_BURST_SIZE; i++)
buf->retptr64[i] = 0;
 
-   /* Set Return bit for each packet returned */
+   /* Set VALID_BUF bit for each packet returned */
for (i = count; i-- > 0; )
buf->retptr64[i] =
(((int64_t)(uintptr_t)(oldpkt[i])) <<
-   RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+   RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_VALID_BUF;
 
/*
 * Finally, set the GET_BUF  to signal to distributor that cache
@@ -97,11 +97,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
return (pkts[0]) ? 1 : 0;
}
 
-   /* If bit is set, return
+   /* If any of below bits is set, return.
+* GET_BUF is set when distributor hasn't sent any packets yet
+* RETURN_BUF is set when distributor must retrieve in-flight packets
 * Sync with distributor to acquire bufptrs
 */
if (__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
-   & RTE_DISTRIB_GET_BUF)
+   & (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))
return -1;
 
/* since bufptr64 is signed, this should be an arithmetic shift */
@@ -113,7 +115,7 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
}
 
/*
-* so now we've got the contents of the cacheline into an  array of
+* so now we've got the contents of the cacheline into an array of
 * mbuf pointers, so toggle the bit so scheduler can start working
 * on the next cacheline while we're working.
 * Sync with distributor on GET_BUF flag. Release bufptrs.
@@ -175,7 +177,7 @@ rte_distrib

[dpdk-dev] [PATCH v7 07/16] distributor: fix return pkt calls in single mode

2020-10-10 Thread Lukasz Wojciechowski
In the single legacy version of the distributor synchronization
requires continues exchange of buffers between distributor
and workers. Empty buffers are sent if only handshake
synchronization is required.
However calls to the rte_distributor_return_pkt()
with 0 buffers in single mode were ignored and not passed to the
legacy algorithm implementation causing lack of synchronization.

This patch fixes this issue by passing NULL as buffer which is
a valid way of sending just synchronization handshakes
in single mode.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 115443fc0..9fd7dcab7 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -168,6 +168,9 @@ rte_distributor_return_pkt(struct rte_distributor *d,
if (num == 1)
return rte_distributor_return_pkt_single(d->d_single,
worker_id, oldpkt[0]);
+   else if (num == 0)
+   return rte_distributor_return_pkt_single(d->d_single,
+   worker_id, NULL);
else
return -EINVAL;
}
-- 
2.17.1



[dpdk-dev] [PATCH v7 06/16] test/distributor: synchronize lcores statistics

2020-10-10 Thread Lukasz Wojciechowski
Statistics of handled packets are cleared and read on main lcore,
while they are increased in workers handlers on different lcores.

Without synchronization occasionally showed invalid values.
This patch uses atomic acquire/release mechanisms to synchronize.

Fixes: c3eabff124e6 ("distributor: add unit tests")
Cc: bruce.richard...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c | 43 +
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 6cd7a2edd..838459392 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -43,7 +43,8 @@ total_packet_count(void)
 {
unsigned i, count = 0;
for (i = 0; i < worker_idx; i++)
-   count += worker_stats[i].handled_packets;
+   count += __atomic_load_n(&worker_stats[i].handled_packets,
+   __ATOMIC_ACQUIRE);
return count;
 }
 
@@ -51,7 +52,10 @@ total_packet_count(void)
 static inline void
 clear_packet_count(void)
 {
-   memset(&worker_stats, 0, sizeof(worker_stats));
+   unsigned int i;
+   for (i = 0; i < RTE_MAX_LCORE; i++)
+   __atomic_store_n(&worker_stats[i].handled_packets, 0,
+   __ATOMIC_RELEASE);
 }
 
 /* this is the basic worker function for sanity test
@@ -69,13 +73,13 @@ handle_work(void *arg)
num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
while (!quit) {
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
-   __ATOMIC_RELAXED);
+   __ATOMIC_ACQ_REL);
count += num;
num = rte_distributor_get_pkt(db, id,
buf, buf, num);
}
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
-   __ATOMIC_RELAXED);
+   __ATOMIC_ACQ_REL);
count += num;
rte_distributor_return_pkt(db, id, buf, num);
return 0;
@@ -131,7 +135,8 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
-   worker_stats[i].handled_packets);
+   __atomic_load_n(&worker_stats[i].handled_packets,
+   __ATOMIC_ACQUIRE));
printf("Sanity test with all zero hashes done.\n");
 
/* pick two flows and check they go correctly */
@@ -156,7 +161,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
-   worker_stats[i].handled_packets);
+   __atomic_load_n(
+   &worker_stats[i].handled_packets,
+   __ATOMIC_ACQUIRE));
printf("Sanity test with two hash values done\n");
}
 
@@ -182,7 +189,8 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
-   worker_stats[i].handled_packets);
+   __atomic_load_n(&worker_stats[i].handled_packets,
+   __ATOMIC_ACQUIRE));
printf("Sanity test with non-zero hashes done\n");
 
rte_mempool_put_bulk(p, (void *)bufs, BURST);
@@ -275,14 +283,16 @@ handle_work_with_free_mbufs(void *arg)
 
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
while (!quit) {
-   worker_stats[id].handled_packets += num;
count += num;
+   __atomic_fetch_add(&worker_stats[id].handled_packets, num,
+   __ATOMIC_ACQ_REL);
for (i = 0; i < num; i++)
rte_pktmbuf_free(buf[i]);
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
}
-   worker_stats[id].handled_packets += num;
count += num;
+   __atomic_fetch_add(&worker_stats[id].handled_packets, num,
+   __ATOMIC_ACQ_REL);
rte_distributor_return_pkt(d, id, buf, num);
return 0;
 }
@@ -358,8 +368,9 @@ handle_work_for_shutdown_test(void *arg)
/* wait for quit single globally, or for worker zero, wait
 * for zero_quit */
while (!quit && !(id == zero_id && zero_quit)) {
-   worker_stats[id].handled_packets += num;
count += num;
+   __atomic_fetch_add(&worker_stats[id].handled_packets, num,
+   __ATOMIC_ACQ_REL);
for (i = 0; i < num; i++)
rte_pktmbuf_fre

[dpdk-dev] [PATCH v7 08/16] test/distributor: fix freeing mbufs

2020-10-10 Thread Lukasz Wojciechowski
Sanity tests with mbuf alloc and shutdown tests assume that
mbufs passed to worker cores are freed in handlers.
Such packets should not be returned to the distributor's main
core. The only packets that should be returned are the packets
send after completion of the tests in quit_workers function.

This patch stops returning mbufs to distributor's core.
In case of shutdown tests it is impossible to determine
how worker and distributor threads would synchronize.
Packets used by tests should be freed and packets used during
quit_workers() shouldn't. That's why returning mbufs to mempool
is moved to test procedure run on distributor thread
from worker threads.

Additionally this patch cleans up unused variables.

Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c | 96 ++---
 1 file changed, 47 insertions(+), 49 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 838459392..06e01ff9d 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -44,7 +44,7 @@ total_packet_count(void)
unsigned i, count = 0;
for (i = 0; i < worker_idx; i++)
count += __atomic_load_n(&worker_stats[i].handled_packets,
-   __ATOMIC_ACQUIRE);
+   __ATOMIC_RELAXED);
return count;
 }
 
@@ -55,7 +55,7 @@ clear_packet_count(void)
unsigned int i;
for (i = 0; i < RTE_MAX_LCORE; i++)
__atomic_store_n(&worker_stats[i].handled_packets, 0,
-   __ATOMIC_RELEASE);
+   __ATOMIC_RELAXED);
 }
 
 /* this is the basic worker function for sanity test
@@ -67,20 +67,18 @@ handle_work(void *arg)
struct rte_mbuf *buf[8] __rte_cache_aligned;
struct worker_params *wp = arg;
struct rte_distributor *db = wp->dist;
-   unsigned int count = 0, num;
+   unsigned int num;
unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 
num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
while (!quit) {
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
-   __ATOMIC_ACQ_REL);
-   count += num;
+   __ATOMIC_RELAXED);
num = rte_distributor_get_pkt(db, id,
buf, buf, num);
}
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
-   __ATOMIC_ACQ_REL);
-   count += num;
+   __ATOMIC_RELAXED);
rte_distributor_return_pkt(db, id, buf, num);
return 0;
 }
@@ -136,7 +134,7 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
__atomic_load_n(&worker_stats[i].handled_packets,
-   __ATOMIC_ACQUIRE));
+   __ATOMIC_RELAXED));
printf("Sanity test with all zero hashes done.\n");
 
/* pick two flows and check they go correctly */
@@ -163,7 +161,7 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
printf("Worker %u handled %u packets\n", i,
__atomic_load_n(
&worker_stats[i].handled_packets,
-   __ATOMIC_ACQUIRE));
+   __ATOMIC_RELAXED));
printf("Sanity test with two hash values done\n");
}
 
@@ -190,7 +188,7 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
__atomic_load_n(&worker_stats[i].handled_packets,
-   __ATOMIC_ACQUIRE));
+   __ATOMIC_RELAXED));
printf("Sanity test with non-zero hashes done\n");
 
rte_mempool_put_bulk(p, (void *)bufs, BURST);
@@ -276,23 +274,20 @@ handle_work_with_free_mbufs(void *arg)
struct rte_mbuf *buf[8] __rte_cache_aligned;
struct worker_params *wp = arg;
struct rte_distributor *d = wp->dist;
-   unsigned int count = 0;
unsigned int i;
unsigned int num;
unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
 
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
while (!quit) {
-   count += num;
__atomic_fetch_add(&worker_stats[id].handled_packets, num,
-   __ATOMIC_ACQ_REL);
+   __ATOMIC_RELAXED);
for (i = 0; i < num; i++)
   

[dpdk-dev] [PATCH v7 09/16] test/distributor: collect return mbufs

2020-10-10 Thread Lukasz Wojciechowski
During quit_workers function distributor's main core processes
some packets to wake up pending worker cores so they can quit.
As quit_workers acts also as a cleanup procedure for next test
case it should also collect these packages returned by workers'
handlers, so the cyclic buffer with returned packets
in distributor remains empty.

Fixes: c3eabff124e6 ("distributor: add unit tests")
Cc: bruce.richard...@intel.com
Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 06e01ff9d..ed03040d1 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -590,6 +590,7 @@ quit_workers(struct worker_params *wp, struct rte_mempool 
*p)
const unsigned num_workers = rte_lcore_count() - 1;
unsigned i;
struct rte_mbuf *bufs[RTE_MAX_LCORE];
+   struct rte_mbuf *returns[RTE_MAX_LCORE];
if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
printf("line %d: Error getting mbufs from pool\n", __LINE__);
return;
@@ -605,6 +606,10 @@ quit_workers(struct worker_params *wp, struct rte_mempool 
*p)
rte_distributor_flush(d);
rte_eal_mp_wait_lcore();
 
+   while (rte_distributor_returned_pkts(d, returns, RTE_MAX_LCORE))
+   ;
+
+   rte_distributor_clear_returns(d);
rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
quit = 0;
-- 
2.17.1



[dpdk-dev] [PATCH v7 10/16] distributor: align API documentation with code

2020-10-10 Thread Lukasz Wojciechowski
After introducing burst API there were some artefacts in the
API documentation from legacy single API.
Also the rte_distributor_poll_pkt() function return values
mismatched the implementation.

Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.h | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/lib/librte_distributor/rte_distributor.h 
b/lib/librte_distributor/rte_distributor.h
index 327c0c4ab..a073e6461 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -155,7 +155,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  * @param pkts
  *   The mbufs pointer array to be filled in (up to 8 packets)
  * @param oldpkt
- *   The previous packet, if any, being processed by the worker
+ *   The previous packets, if any, being processed by the worker
  * @param retcount
  *   The number of packets being returned
  *
@@ -187,15 +187,15 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 /**
  * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
+ * Any previous packets given to the worker are assumed to have completed
  * processing, and may be optionally returned to the distributor via
  * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
- * new packet to be provided by the distributor.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for
+ * new packets to be provided by the distributor.
  *
- * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
- * be used to poll for the packet requested. The 
rte_distributor_get_pkt_burst()
- * API should *not* be used to try and retrieve the new packet.
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packets requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packets.
  *
  * @param d
  *   The distributor instance to be used
@@ -213,9 +213,9 @@ rte_distributor_request_pkt(struct rte_distributor *d,
unsigned int count);
 
 /**
- * API called by a worker to check for a new packet that was previously
+ * API called by a worker to check for new packets that were previously
  * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
+ * for the new packets to be available, but returns if the request has
  * not yet been fulfilled by the distributor.
  *
  * @param d
@@ -227,8 +227,9 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   The array of mbufs being given to the worker
  *
  * @return
- *   The number of packets being given to the worker thread, zero if no
- *   packet is yet available.
+ *   The number of packets being given to the worker thread,
+ *   -1 if no packets are yet available (burst API - RTE_DIST_ALG_BURST)
+ *   0 if no packets are yet available (legacy single API - 
RTE_DIST_ALG_SINGLE)
  */
 int
 rte_distributor_poll_pkt(struct rte_distributor *d,
-- 
2.17.1



[dpdk-dev] [PATCH v7 13/16] test/distributor: add test with packets marking

2020-10-10 Thread Lukasz Wojciechowski
All of the former tests analyzed only statistics
of packets processed by all workers.
The new test verifies also if packets are processed
on workers as expected.
Every packets processed by the worker is marked
and analyzed after it is returned to distributor.

This test allows finding issues in matching algorithms.

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c | 141 
 1 file changed, 141 insertions(+)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index e8dd75078..4fc10b3cc 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -542,6 +542,141 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
return 0;
 }
 
+static int
+handle_and_mark_work(void *arg)
+{
+   struct rte_mbuf *buf[8] __rte_cache_aligned;
+   struct worker_params *wp = arg;
+   struct rte_distributor *db = wp->dist;
+   unsigned int num, i;
+   unsigned int id = __atomic_fetch_add(&worker_idx, 1, __ATOMIC_RELAXED);
+   num = rte_distributor_get_pkt(db, id, buf, NULL, 0);
+   while (!quit) {
+   __atomic_fetch_add(&worker_stats[id].handled_packets, num,
+   __ATOMIC_RELAXED);
+   for (i = 0; i < num; i++)
+   buf[i]->udata64 += id + 1;
+   num = rte_distributor_get_pkt(db, id,
+   buf, buf, num);
+   }
+   __atomic_fetch_add(&worker_stats[id].handled_packets, num,
+   __ATOMIC_RELAXED);
+   rte_distributor_return_pkt(db, id, buf, num);
+   return 0;
+}
+
+/* sanity_mark_test sends packets to workers which mark them.
+ * Every packet has also encoded sequence number.
+ * The returned packets are sorted and verified if they were handled
+ * by proper workers.
+ */
+static int
+sanity_mark_test(struct worker_params *wp, struct rte_mempool *p)
+{
+   const unsigned int buf_count = 24;
+   const unsigned int burst = 8;
+   const unsigned int shift = 12;
+   const unsigned int seq_shift = 10;
+
+   struct rte_distributor *db = wp->dist;
+   struct rte_mbuf *bufs[buf_count];
+   struct rte_mbuf *returns[buf_count];
+   unsigned int i, count, id;
+   unsigned int sorted[buf_count], seq;
+   unsigned int failed = 0;
+
+   printf("=== Marked packets test ===\n");
+   clear_packet_count();
+   if (rte_mempool_get_bulk(p, (void *)bufs, buf_count) != 0) {
+   printf("line %d: Error getting mbufs from pool\n", __LINE__);
+   return -1;
+   }
+
+   /* bufs' hashes will be like these below, but shifted left.
+* The shifting is for avoiding collisions with backlogs
+* and in-flight tags left by previous tests.
+* [1, 1, 1, 1, 1, 1, 1, 1
+*  1, 1, 1, 1, 2, 2, 2, 2
+*  2, 2, 2, 2, 1, 1, 1, 1]
+*/
+   for (i = 0; i < burst; i++) {
+   bufs[0 * burst + i]->hash.usr = 1 << shift;
+   bufs[1 * burst + i]->hash.usr = ((i < burst / 2) ? 1 : 2)
+   << shift;
+   bufs[2 * burst + i]->hash.usr = ((i < burst / 2) ? 2 : 1)
+   << shift;
+   }
+   /* Assign a sequence number to each packet. The sequence is shifted,
+* so that lower bits of the udate64 will hold mark from worker.
+*/
+   for (i = 0; i < buf_count; i++)
+   bufs[i]->udata64 = i << seq_shift;
+
+   count = 0;
+   for (i = 0; i < buf_count/burst; i++) {
+   rte_distributor_process(db, &bufs[i * burst], burst);
+   count += rte_distributor_returned_pkts(db, &returns[count],
+   buf_count - count);
+   }
+
+   do {
+   rte_distributor_flush(db);
+   count += rte_distributor_returned_pkts(db, &returns[count],
+   buf_count - count);
+   } while (count < buf_count);
+
+   for (i = 0; i < rte_lcore_count() - 1; i++)
+   printf("Worker %u handled %u packets\n", i,
+   __atomic_load_n(&worker_stats[i].handled_packets,
+   __ATOMIC_RELAXED));
+
+   /* Sort returned packets by sent order (sequence numbers). */
+   for (i = 0; i < buf_count; i++) {
+   seq = returns[i]->udata64 >> seq_shift;
+   id = returns[i]->udata64 - (seq << seq_shift);
+   sorted[seq] = id;
+   }
+
+   /* Verify that packets [0-11] and [20-23] were processed
+* by the same worker
+*/
+   for (i = 1; i < 12; i++) {
+   if (sorted[i] != sorted[0]) {
+   printf("Packet number %u processed by worker %u,"
+   " but should be processes by worker %u\n",
+   i, sorted[i], sorted[0]);
+   failed = 1;
+   }
+   }
+   f

[dpdk-dev] [PATCH v7 12/16] distributor: fix scalar matching

2020-10-10 Thread Lukasz Wojciechowski
Fix improper indexes while comparing tags.
In the find_match_scalar() function:
* j iterates over flow tags of following packets;
* w iterates over backlog or in flight tags positions.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 9fd7dcab7..4bd23a990 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -261,13 +261,13 @@ find_match_scalar(struct rte_distributor *d,
 
for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
-   if (d->in_flight_tags[i][j] == data_ptr[w]) {
+   if (d->in_flight_tags[i][w] == data_ptr[j]) {
output_ptr[j] = i+1;
break;
}
for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
-   if (bl->tags[j] == data_ptr[w]) {
+   if (bl->tags[w] == data_ptr[j]) {
output_ptr[j] = i+1;
break;
}
-- 
2.17.1



[dpdk-dev] [PATCH v7 11/16] test/distributor: replace delays with spin locks

2020-10-10 Thread Lukasz Wojciechowski
Instead of making delays in test code and waiting
for worker hopefully to reach proper states,
synchronize worker shutdown test cases with spin lock
on atomic variable.

Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 app/test/test_distributor.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index ed03040d1..e8dd75078 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -27,6 +27,7 @@ struct worker_params worker_params;
 /* statics - all zero-initialized by default */
 static volatile int quit;  /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
+static volatile int zero_sleep; /**< thr0 has quit basic loop and is sleeping*/
 static volatile unsigned worker_idx;
 static volatile unsigned zero_idx;
 
@@ -376,8 +377,10 @@ handle_work_for_shutdown_test(void *arg)
/* for worker zero, allow it to restart to pick up last packet
 * when all workers are shutting down.
 */
+   __atomic_store_n(&zero_sleep, 1, __ATOMIC_RELEASE);
while (zero_quit)
usleep(100);
+   __atomic_store_n(&zero_sleep, 0, __ATOMIC_RELEASE);
 
num = rte_distributor_get_pkt(d, id, buf, NULL, 0);
 
@@ -445,7 +448,12 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 
/* flush the distributor */
rte_distributor_flush(d);
-   rte_delay_us(1);
+   while (!__atomic_load_n(&zero_sleep, __ATOMIC_ACQUIRE))
+   rte_distributor_flush(d);
+
+   zero_quit = 0;
+   while (__atomic_load_n(&zero_sleep, __ATOMIC_ACQUIRE))
+   rte_delay_us(100);
 
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
@@ -505,9 +513,14 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
/* flush the distributor */
rte_distributor_flush(d);
 
-   rte_delay_us(1);
+   while (!__atomic_load_n(&zero_sleep, __ATOMIC_ACQUIRE))
+   rte_distributor_flush(d);
 
zero_quit = 0;
+
+   while (__atomic_load_n(&zero_sleep, __ATOMIC_ACQUIRE))
+   rte_delay_us(100);
+
for (i = 0; i < rte_lcore_count() - 1; i++)
printf("Worker %u handled %u packets\n", i,
__atomic_load_n(&worker_stats[i].handled_packets,
@@ -615,6 +628,8 @@ quit_workers(struct worker_params *wp, struct rte_mempool 
*p)
quit = 0;
worker_idx = 0;
zero_idx = RTE_MAX_LCORE;
+   zero_quit = 0;
+   zero_sleep = 0;
 }
 
 static int
-- 
2.17.1



[dpdk-dev] [PATCH v7 14/16] distributor: fix flushing in flight packets

2020-10-10 Thread Lukasz Wojciechowski
rte_distributor_flush() is using total_outstanding()
function to calculate if it should still wait
for processing packets. However in burst mode
only backlog packets were counted.

This patch fixes that issue by counting also in flight
packets. There are also sum fixes to properly keep
count of in flight packets for each worker in bufs[].count.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 4bd23a990..2478de3b7 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -467,6 +467,7 @@ rte_distributor_process(struct rte_distributor *d,
/* Sync with worker on GET_BUF flag. */
if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) {
+   d->bufs[wid].count = 0;
release(d, wid);
handle_returns(d, wid);
}
@@ -481,11 +482,6 @@ rte_distributor_process(struct rte_distributor *d,
uint16_t matches[RTE_DIST_BURST_SIZE];
unsigned int pkts;
 
-   /* Sync with worker on GET_BUF flag. */
-   if (__atomic_load_n(&(d->bufs[wkr].bufptr64[0]),
-   __ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)
-   d->bufs[wkr].count = 0;
-
if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
pkts = num_mbufs - next_idx;
else
@@ -605,8 +601,10 @@ rte_distributor_process(struct rte_distributor *d,
for (wid = 0 ; wid < d->num_workers; wid++)
/* Sync with worker on GET_BUF flag. */
if ((__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-   __ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF))
+   __ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)) {
+   d->bufs[wid].count = 0;
release(d, wid);
+   }
 
return num_mbufs;
 }
@@ -649,7 +647,7 @@ total_outstanding(const struct rte_distributor *d)
unsigned int wkr, total_outstanding = 0;
 
for (wkr = 0; wkr < d->num_workers; wkr++)
-   total_outstanding += d->backlog[wkr].count;
+   total_outstanding += d->backlog[wkr].count + d->bufs[wkr].count;
 
return total_outstanding;
 }
-- 
2.17.1



[dpdk-dev] [PATCH v7 15/16] distributor: fix clearing returns buffer

2020-10-10 Thread Lukasz Wojciechowski
The patch clears distributors returns buffer
in clear_returns() by setting start and count to 0.

Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
Cc: david.h...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Lukasz Wojciechowski 
Acked-by: David Hunt 
---
 lib/librte_distributor/rte_distributor.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 2478de3b7..57240304a 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -704,6 +704,8 @@ rte_distributor_clear_returns(struct rte_distributor *d)
/* Sync with worker. Release retptrs. */
__atomic_store_n(&(d->bufs[wkr].retptr64[0]), 0,
__ATOMIC_RELEASE);
+
+   d->returns.start = d->returns.count = 0;
 }
 
 /* creates a distributor instance */
-- 
2.17.1



[dpdk-dev] [PATCH v7 16/16] test/distributor: ensure all packets are delivered

2020-10-10 Thread Lukasz Wojciechowski
In all distributor tests there is a chance that tests
will send packets to distributor with rte_distributor_process()
before workers are started and requested for packets.

This patch ensures that all packets are delivered to workers
by calling rte_distributor_process() in loop until number
of successfully processed packets reaches required by test.
Change is applied to every first call in test case.

Signed-off-by: Lukasz Wojciechowski 
---
 app/test/test_distributor.c | 32 +++-
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 4fc10b3cc..3c56358d4 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -103,6 +103,7 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
struct rte_mbuf *returns[BURST*2];
unsigned int i, count;
unsigned int retries;
+   unsigned int processed;
 
printf("=== Basic distributor sanity tests ===\n");
clear_packet_count();
@@ -116,7 +117,11 @@ sanity_test(struct worker_params *wp, struct rte_mempool 
*p)
for (i = 0; i < BURST; i++)
bufs[i]->hash.usr = 0;
 
-   rte_distributor_process(db, bufs, BURST);
+   processed = 0;
+   while (processed < BURST)
+   processed += rte_distributor_process(db, &bufs[processed],
+   BURST - processed);
+
count = 0;
do {
 
@@ -304,6 +309,7 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, 
struct rte_mempool *p)
struct rte_distributor *d = wp->dist;
unsigned i;
struct rte_mbuf *bufs[BURST];
+   unsigned int processed;
 
printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
 
@@ -316,7 +322,10 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, 
struct rte_mempool *p)
bufs[j]->hash.usr = (i+j) << 1;
}
 
-   rte_distributor_process(d, bufs, BURST);
+   processed = 0;
+   while (processed < BURST)
+   processed += rte_distributor_process(d,
+   &bufs[processed], BURST - processed);
}
 
rte_distributor_flush(d);
@@ -409,6 +418,7 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
struct rte_mbuf *bufs2[BURST];
unsigned int i;
unsigned int failed = 0;
+   unsigned int processed = 0;
 
printf("=== Sanity test of worker shutdown ===\n");
 
@@ -426,7 +436,10 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
for (i = 0; i < BURST; i++)
bufs[i]->hash.usr = 1;
 
-   rte_distributor_process(d, bufs, BURST);
+   processed = 0;
+   while (processed < BURST)
+   processed += rte_distributor_process(d, &bufs[processed],
+   BURST - processed);
rte_distributor_flush(d);
 
/* at this point, we will have processed some packets and have a full
@@ -488,6 +501,7 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
struct rte_mbuf *bufs[BURST];
unsigned int i;
unsigned int failed = 0;
+   unsigned int processed;
 
printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
@@ -502,7 +516,10 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
for (i = 0; i < BURST; i++)
bufs[i]->hash.usr = 0;
 
-   rte_distributor_process(d, bufs, BURST);
+   processed = 0;
+   while (processed < BURST)
+   processed += rte_distributor_process(d, &bufs[processed],
+   BURST - processed);
/* at this point, we will have processed some packets and have a full
 * backlog for the other ones at worker 0.
 */
@@ -584,6 +601,7 @@ sanity_mark_test(struct worker_params *wp, struct 
rte_mempool *p)
unsigned int i, count, id;
unsigned int sorted[buf_count], seq;
unsigned int failed = 0;
+   unsigned int processed;
 
printf("=== Marked packets test ===\n");
clear_packet_count();
@@ -614,7 +632,11 @@ sanity_mark_test(struct worker_params *wp, struct 
rte_mempool *p)
 
count = 0;
for (i = 0; i < buf_count/burst; i++) {
-   rte_distributor_process(db, &bufs[i * burst], burst);
+   processed = 0;
+   while (processed < burst)
+   processed += rte_distributor_process(db,
+   &bufs[i * burst + processed],
+   burst - processed);
count += rte_distributor_returned_pkts(db, &returns[count],
buf_count - count);
}
-- 
2.17.1



Re: [dpdk-dev] [PATCH v5 00/15] fix distributor synchronization issues

2020-10-10 Thread Lukasz Wojciechowski


W dniu 10.10.2020 o 10:12, David Marchand pisze:
> Hello Lukasz,
>
> On Sat, Oct 10, 2020 at 1:26 AM Lukasz Wojciechowski
>  wrote:
>> W dniu 09.10.2020 o 23:41, Lukasz Wojciechowski pisze:
>> More bad news - same issue just appeared on travis for v6.
>> Good news we can reproduce it.
>>
>> Is there a way to delegate a job for travis other way than sending a new 
>> patch version?
> You just need to fork dpdk in github, then setup travis.
> Travis will get triggered on push.
> I can help offlist if needed.

Thank you

I managed to reproduce the issue by stressing my machine's cpus and memory.

The issue was caused by slow start of worker threads, which didn't reach 
place where they request for packages, because of that
they were treated as not activated. The distributor thread didn't send 
any packets because of that fact, but waited in an infinite loop until 
packets are returned from workers.

I pushed v7 of series with additional patch fixing that by running 
rte_distributor_process() in a loop until it manages to send all packets 
to workers.

>
>
-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciec...@partner.samsung.com



Re: [dpdk-dev] [dpdk-dev v3 2/2] fips_validation: update GCM test

2020-10-10 Thread Akhil Goyal
> This patch updates fips validation GCM test capabilities:
> 
> - In NIST GCMVS spec GMAC test vectors are the GCM ones with
> plaintext length as 0 and uses AAD as input data. Originally
> fips_validation tests treats them both as GCM test vectors.
> This patch introduce automatic test type recognition between
> the two: when plaintext length is 0 the prepare_gmac_xform
> and prepare_auth_op functions are called, otherwise
> prepare_gcm_xform and prepare_aead_op functions are called.
> 
> - NIST GCMVS also specified externally or internally IV
> generation. When IV is to be generated by IUT internally IUT
> shall store the generated IV in the response file. This patch
> also adds the support to that.
> 
> Signed-off-by: Fan Zhang 
> Signed-off-by: Weqaar Janjua 
> Acked-by: John Griffin 
> ---
>  doc/guides/rel_notes/release_20_11.rst|   5 +
>  examples/fips_validation/fips_validation.h|  26 
>  .../fips_validation/fips_validation_gcm.c | 118 --
>  examples/fips_validation/main.c   |  65 --
>  4 files changed, 189 insertions(+), 25 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 059ea5fca..7441e6ce4 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -159,6 +159,11 @@ New Features
>* Extern objects and functions can be plugged into the pipeline.
>* Transaction-oriented table updates.
> 
> +* **fips_validation sample application enhancement.**
> +
> + fips_vadation sample application is added SGL and NIST GCMVS complaint
> + GMAC test method support.
> +
> 
These release notes need to be split into two for both the patches in the list. 
I did it while applying the patches.

Series Applied to dpdk-next-crypto

Thanks.


Re: [dpdk-dev] [dpdk-dev v11 2/4] cryptodev: add raw crypto data-path APIs

2020-10-10 Thread Akhil Goyal
Hi Fan,
> 
> This patch adds raw data-path APIs for enqueue and dequeue
> operations to cryptodev. The APIs support flexible user-define
> enqueue and dequeue behaviors.
> 
> Signed-off-by: Fan Zhang 
> Signed-off-by: Piotr Bronowski 
> Acked-by: Adam Dybkowski 
> ---
>  doc/guides/cryptodevs/features/default.ini|   1 +
>  doc/guides/cryptodevs/features/qat.ini|   1 +
>  doc/guides/prog_guide/cryptodev_lib.rst   |  97 +
>  doc/guides/rel_notes/release_20_11.rst|   7 +
>  lib/librte_cryptodev/rte_cryptodev.c  |  80 
>  lib/librte_cryptodev/rte_cryptodev.h  | 367 +-
>  lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 ++-
>  .../rte_cryptodev_version.map |  10 +
>  8 files changed, 611 insertions(+), 3 deletions(-)
> 

The release notes should be updated just above aesni_mb crypto PMD updates

+* **Added raw data-path APIs for cryptodev library.**
+
+  Cryptodev is added with raw data-path APIs to accelerate external
+  libraries or applications which need to avail fast cryptodev
+  enqueue/dequeue operations but does not necessarily depends on
+  mbufs and cryptodev operation mempools.
+

I have following diff which should be incorporated in this patch.
Qat.ini file should be updated in the 3/4 patch.
Release notes update is also missing for QAT.

diff --git a/doc/guides/cryptodevs/features/qat.ini 
b/doc/guides/cryptodevs/features/qat.ini
index 9e82f2886..6cc09cde7 100644
--- a/doc/guides/cryptodevs/features/qat.ini
+++ b/doc/guides/cryptodevs/features/qat.ini
@@ -17,6 +17,7 @@ Digest encrypted   = Y
 Asymmetric sessionless = Y
 RSA PRIV OP KEY EXP= Y
 RSA PRIV OP KEY QT = Y
-Sym raw data path API  = Y

 ;
 ; Supported crypto algorithms of the 'qat' crypto driver.
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index 8ba800122..7fb3022bd 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -696,9 +696,9 @@ the status buffer provided by the user):
   are stored. The crypto device will then start enqueuing all of them at
   once.

-Calling ``rte_cryptodev_configure_raw_dp_context`` with the parameter
+Calling ``rte_cryptodev_configure_raw_dp_ctx`` with the parameter
 ``is_update`` set as 0 twice without the enqueue function returning status 1 or
-``rte_cryptodev_dp_enqueue_done`` function call in between will invalidate any
+``rte_cryptodev_raw_enqueue_done`` function call in between will invalidate any
 descriptors stored in the device queue but not enqueued. This feature is useful
 when the user wants to abandon partially enqueued data for a failed enqueue
 burst operation and try enqueuing in a whole later.
diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 7a143c4b9..3d95ac6ea 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -1833,13 +1833,6 @@ rte_cryptodev_raw_enqueue_done(struct 
rte_crypto_raw_dp_ctx *ctx,
return (*ctx->enqueue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
 }

-int
-rte_cryptodev_raw_dequeue_done(struct rte_crypto_raw_dp_ctx *ctx,
-   uint32_t n)
-{
-   return (*ctx->dequeue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
-}
-
 uint32_t
 rte_cryptodev_raw_dequeue_burst(struct rte_crypto_raw_dp_ctx *ctx,
rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
@@ -1852,6 +1845,13 @@ rte_cryptodev_raw_dequeue_burst(struct 
rte_crypto_raw_dp_ctx *ctx,
is_user_data_array, n_success_jobs, status);
 }

+int
+rte_cryptodev_raw_dequeue_done(struct rte_crypto_raw_dp_ctx *ctx,
+   uint32_t n)
+{
+   return (*ctx->dequeue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
+}
+
 /** Initialise rte_crypto_op mempool element */
 static void
 rte_crypto_op_init(struct rte_mempool *mempool,
diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
b/lib/librte_cryptodev/rte_cryptodev.h
index 840a1c54c..79cfa46c8 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -459,7 +459,7 @@ rte_cryptodev_asym_get_xform_enum(enum 
rte_crypto_asym_xform_type *xform_enum,
 #define RTE_CRYPTODEV_FF_NON_BYTE_ALIGNED_DATA (1ULL << 23)
 /**< Support operations on data which is not byte aligned */
 #define RTE_CRYPTODEV_FF_SYM_RAW_DP(1ULL << 24)
-/**< Support accelerated specific raw data-path APIs */
+/**< Support accelerator specific symmetric raw data-path APIs */

 /**
  * Get the name of a crypto device feature flag
@@ -1344,7 +1344,7 @@ union rte_cryptodev_session_ctx {
 };

 /**
- * Enqueue a data vector into device queue but the driver will not start
+ * Enqueue a data vector into device queue but the driver may or may not start
  * processing until rte_cryptodev_raw_enqueue_done() is called.
  *
  * @param  qp  Driver specific queue pair data.
@@ -1357,7 +1357,7 @@ union rte_cryptodev_session_ctx {
  * @r

Re: [dpdk-dev] [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw API test

2020-10-10 Thread Akhil Goyal
Hi Fan,

> +static uint32_t
> +get_raw_dp_dequeue_count(void *user_data __rte_unused)
> +{
> + return 1;
Why is this 1 always? There could be jobs >1 which are processed.

> +}
> +
> +static void
> +post_process_raw_dp_op(void *user_data,  uint32_t index __rte_unused,
> + uint8_t is_op_success)
> +{
> + struct rte_crypto_op *op = user_data;
> + op->status = is_op_success ? RTE_CRYPTO_OP_STATUS_SUCCESS :
> + RTE_CRYPTO_OP_STATUS_ERROR;
> +}
> +
> +void
> +process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
> + struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth,
> + uint8_t len_in_bits, uint8_t cipher_iv_len)
> +{
> + struct rte_crypto_sym_op *sop = op->sym;
> + struct rte_crypto_op *ret_op = NULL;
> + struct rte_crypto_vec data_vec[UINT8_MAX];
> + struct rte_crypto_va_iova_ptr cipher_iv, digest, aad_auth_iv;
> + union rte_crypto_sym_ofs ofs;
> + struct rte_crypto_sym_vec vec;
> + struct rte_crypto_sgl sgl;
> + uint32_t max_len;
> + union rte_cryptodev_session_ctx sess;
> + uint32_t count = 0;
> + struct rte_crypto_raw_dp_ctx *ctx;
> + uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0,
> + auth_len = 0;
> + int32_t n;
> + uint32_t n_success;
> + int ctx_service_size;
> + int32_t status = 0;
> +
> + ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id);
> + if (ctx_service_size < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + ctx = malloc(ctx_service_size);
> + if (!ctx) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + /* Both are enums, setting crypto_sess will suit any session type */
> + sess.crypto_sess = op->sym->session;
> +
> + if (rte_cryptodev_configure_raw_dp_ctx(dev_id, qp_id, ctx,
> + op->sess_type, sess, 0) < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + goto exit;
> + }
> +
> + cipher_iv.iova = 0;
> + cipher_iv.va = NULL;
> + aad_auth_iv.iova = 0;
> + aad_auth_iv.va = NULL;
> + digest.iova = 0;
> + digest.va = NULL;
> + sgl.vec = data_vec;
> + vec.num = 1;
> + vec.sgl = &sgl;
> + vec.iv = &cipher_iv;
> + vec.digest = &digest;
> + vec.aad = &aad_auth_iv;
> + vec.status = &status;
> +
> + ofs.raw = 0;
> +
> + if (is_cipher && is_auth) {
> + cipher_offset = sop->cipher.data.offset;
> + cipher_len = sop->cipher.data.length;
> + auth_offset = sop->auth.data.offset;
> + auth_len = sop->auth.data.length;
> + max_len = RTE_MAX(cipher_offset + cipher_len,
> + auth_offset + auth_len);
> + if (len_in_bits) {
> + max_len = max_len >> 3;
> + cipher_offset = cipher_offset >> 3;
> + auth_offset = auth_offset >> 3;
> + cipher_len = cipher_len >> 3;
> + auth_len = auth_len >> 3;
> + }
> + ofs.ofs.cipher.head = cipher_offset;
> + ofs.ofs.cipher.tail = max_len - cipher_offset - cipher_len;
> + ofs.ofs.auth.head = auth_offset;
> + ofs.ofs.auth.tail = max_len - auth_offset - auth_len;
> + cipher_iv.va = rte_crypto_op_ctod_offset(op, void *,
> IV_OFFSET);
> + cipher_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET);
> + aad_auth_iv.va = rte_crypto_op_ctod_offset(
> + op, void *, IV_OFFSET + cipher_iv_len);
> + aad_auth_iv.iova = rte_crypto_op_ctophys_offset(op,
> IV_OFFSET +
> + cipher_iv_len);
> + digest.va = (void *)sop->auth.digest.data;
> + digest.iova = sop->auth.digest.phys_addr;
> +
> + } else if (is_cipher) {
> + cipher_offset = sop->cipher.data.offset;
> + cipher_len = sop->cipher.data.length;
> + max_len = cipher_len + cipher_offset;
> + if (len_in_bits) {
> + max_len = max_len >> 3;
> + cipher_offset = cipher_offset >> 3;
> + cipher_len = cipher_len >> 3;
> + }
> + ofs.ofs.cipher.head = cipher_offset;
> + ofs.ofs.cipher.tail = max_len - cipher_offset - cipher_len;
> + cipher_iv.va = rte_crypto_op_ctod_offset(op, void *,
> IV_OFFSET);
> + cipher_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET);
> +
> + } else if (is_auth) {
> + auth_offset = sop->auth.data.offset;
> + auth_len = sop->auth.data.length;
> + max_len = auth_len + auth_offset;
> + if (len_in_bits) {
> + max_len = max_len >> 3;
> + auth_offset = auth_offset >> 3;
> +  

Re: [dpdk-dev] [PATCH] ethdev: check if queue setupped in queue-related APIs

2020-10-10 Thread Kalesh Anakkur Purayil
On Sat, Oct 10, 2020 at 12:42 PM Wei Hu (Xavier) 
wrote:

> From: Chengchang Tang 
>
> This patch adds checking whether the related Tx or Rx queue has been
> setupped in the queue-related API functions to avoid illegal address
> access. And validity check of the queue_id is also added in the API
> functions rte_eth_dev_rx_intr_enable and rte_eth_dev_rx_intr_disable.
>
> Signed-off-by: Chengchang Tang 
> Signed-off-by: Wei Hu (Xavier) 
> Signed-off-by: Chengwen Feng 
> ---
>  lib/librte_ethdev/rte_ethdev.c | 56
> ++
>  lib/librte_ethdev/rte_ethdev.h |  3 ++-
>  2 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c
> b/lib/librte_ethdev/rte_ethdev.c
> index 892c246..31a8eb3 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -897,6 +897,13 @@ rte_eth_dev_rx_queue_start(uint16_t port_id, uint16_t
> rx_queue_id)
> return -EINVAL;
> }
>
> +   if (dev->data->rx_queues[rx_queue_id] == NULL) {
> +   RTE_ETHDEV_LOG(ERR, "Rx queue %"PRIu16" of device with
> port_id=%"
> +   PRIu16" has not been setupped\n",
> +   rx_queue_id, port_id);
> +   return -EINVAL;
> +   }
> +
>

Hi Xavier,

How about having two common functions which validate RXQ/TXQ ids and
whether it has been set up or not like below. This helps avoiding lot of
duplicate code:

static inline int
rte_eth_dev_validate_rx_queue(uint16_t port_id, uint16_t rx_queue_id)
{
struct rte_eth_dev *dev;

RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];

if (rx_queue_id >= dev->data->nb_rx_queues) {
RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
rx_queue_id);
return -EINVAL;
}

   if (dev->data->rx_queues[rx_queue_id] == NULL) {
   RTE_ETHDEV_LOG(ERR,
  "Queue %u of device with port_id=%u has not
been setup\n",
  rx_queue_id, port_id);
   return -EINVAL;
   }

   return 0;
}

Regards,
Kalesh

> --
> 2.9.5
>
>

-- 
Regards,
Kalesh A P


Re: [dpdk-dev] [dpdk-dev v11 2/4] cryptodev: add raw crypto data-path APIs

2020-10-10 Thread Zhang, Roy Fan
Hi Akhil,

Will upate in v11.

regards,
Fan


发件人: Akhil Goyal 
发送时间: Saturday, October 10, 2020 8:38:39 PM
收件人: Zhang, Roy Fan ; dev@dpdk.org 
抄送: Bronowski, PiotrX ; Dybkowski, AdamX 
; Ananyev, Konstantin 
主题: RE: [dpdk-dev v11 2/4] cryptodev: add raw crypto data-path APIs

Hi Fan,
>
> This patch adds raw data-path APIs for enqueue and dequeue
> operations to cryptodev. The APIs support flexible user-define
> enqueue and dequeue behaviors.
>
> Signed-off-by: Fan Zhang 
> Signed-off-by: Piotr Bronowski 
> Acked-by: Adam Dybkowski 
> ---
>  doc/guides/cryptodevs/features/default.ini|   1 +
>  doc/guides/cryptodevs/features/qat.ini|   1 +
>  doc/guides/prog_guide/cryptodev_lib.rst   |  97 +
>  doc/guides/rel_notes/release_20_11.rst|   7 +
>  lib/librte_cryptodev/rte_cryptodev.c  |  80 
>  lib/librte_cryptodev/rte_cryptodev.h  | 367 +-
>  lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 ++-
>  .../rte_cryptodev_version.map |  10 +
>  8 files changed, 611 insertions(+), 3 deletions(-)
>

The release notes should be updated just above aesni_mb crypto PMD updates

+* **Added raw data-path APIs for cryptodev library.**
+
+  Cryptodev is added with raw data-path APIs to accelerate external
+  libraries or applications which need to avail fast cryptodev
+  enqueue/dequeue operations but does not necessarily depends on
+  mbufs and cryptodev operation mempools.
+

I have following diff which should be incorporated in this patch.
Qat.ini file should be updated in the 3/4 patch.
Release notes update is also missing for QAT.

diff --git a/doc/guides/cryptodevs/features/qat.ini 
b/doc/guides/cryptodevs/features/qat.ini
index 9e82f2886..6cc09cde7 100644
--- a/doc/guides/cryptodevs/features/qat.ini
+++ b/doc/guides/cryptodevs/features/qat.ini
@@ -17,6 +17,7 @@ Digest encrypted   = Y
 Asymmetric sessionless = Y
 RSA PRIV OP KEY EXP= Y
 RSA PRIV OP KEY QT = Y
-Sym raw data path API  = Y

 ;
 ; Supported crypto algorithms of the 'qat' crypto driver.
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index 8ba800122..7fb3022bd 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -696,9 +696,9 @@ the status buffer provided by the user):
   are stored. The crypto device will then start enqueuing all of them at
   once.

-Calling ``rte_cryptodev_configure_raw_dp_context`` with the parameter
+Calling ``rte_cryptodev_configure_raw_dp_ctx`` with the parameter
 ``is_update`` set as 0 twice without the enqueue function returning status 1 or
-``rte_cryptodev_dp_enqueue_done`` function call in between will invalidate any
+``rte_cryptodev_raw_enqueue_done`` function call in between will invalidate any
 descriptors stored in the device queue but not enqueued. This feature is useful
 when the user wants to abandon partially enqueued data for a failed enqueue
 burst operation and try enqueuing in a whole later.
diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 7a143c4b9..3d95ac6ea 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -1833,13 +1833,6 @@ rte_cryptodev_raw_enqueue_done(struct 
rte_crypto_raw_dp_ctx *ctx,
return (*ctx->enqueue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
 }

-int
-rte_cryptodev_raw_dequeue_done(struct rte_crypto_raw_dp_ctx *ctx,
-   uint32_t n)
-{
-   return (*ctx->dequeue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
-}
-
 uint32_t
 rte_cryptodev_raw_dequeue_burst(struct rte_crypto_raw_dp_ctx *ctx,
rte_cryptodev_raw_get_dequeue_count_t get_dequeue_count,
@@ -1852,6 +1845,13 @@ rte_cryptodev_raw_dequeue_burst(struct 
rte_crypto_raw_dp_ctx *ctx,
is_user_data_array, n_success_jobs, status);
 }

+int
+rte_cryptodev_raw_dequeue_done(struct rte_crypto_raw_dp_ctx *ctx,
+   uint32_t n)
+{
+   return (*ctx->dequeue_done)(ctx->qp_data, ctx->drv_ctx_data, n);
+}
+
 /** Initialise rte_crypto_op mempool element */
 static void
 rte_crypto_op_init(struct rte_mempool *mempool,
diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
b/lib/librte_cryptodev/rte_cryptodev.h
index 840a1c54c..79cfa46c8 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -459,7 +459,7 @@ rte_cryptodev_asym_get_xform_enum(enum 
rte_crypto_asym_xform_type *xform_enum,
 #define RTE_CRYPTODEV_FF_NON_BYTE_ALIGNED_DATA (1ULL << 23)
 /**< Support operations on data which is not byte aligned */
 #define RTE_CRYPTODEV_FF_SYM_RAW_DP(1ULL << 24)
-/**< Support accelerated specific raw data-path APIs */
+/**< Support accelerator specific symmetric raw data-path APIs */

 /**
  * Get the name of a crypto device feature flag
@@ -1344,7 +1344,7 @@ union rte_cryptodev_session_ctx {
 };

 /**
- * Enqueue a data vector into device queue

Re: [dpdk-dev] [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw API test

2020-10-10 Thread Zhang, Roy Fan
Hi Akhil,

For your “always return 1” question:

The way dequeue_burst API works is we may not know how many ops to dequeue 
without parsing the first user data (e.g. a structure containing n_ops data). 
It is up to the user to provide a callback function to return the number of ops 
- either by parsing the data structure, or a constant number - so in our unit 
test it is always 1 op to process. So returning 1.

For 2nd and 3rd question: enqueue_burst and dequeue_burst have to return 2 
values, the number of ops enqueued or stored but not enqueued, and the 
operation status (enqueued/stored by not enqueued/error). The changed API will 
return the number of ops enqueued/dequeued, that’s why I made the check here. 
The operation status (0/1/error code) is stored in the “status” field by the 
driver. This is explained in the header file comments

“

+ * @return
+ *   - The number of descriptors successfully enqueued.
+ *   - Possible enqueue status written by the driver:
+ * - 1: The descriptors are enqueued successfully.
+ * - 0: The descriptors are stored into device queue but are not processed
+ *  until rte_cryptodev_raw_enqueue_done() is called.
+ * - negative integer: failure.

If you think it is not clear, any suggestions?

Regards,
Fan



发件人: Akhil Goyal 
发送时间: Saturday, October 10, 2020 8:55:07 PM
收件人: Zhang, Roy Fan ; dev@dpdk.org 
抄送: Dybkowski, AdamX 
主题: RE: [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw API test

Hi Fan,

> +static uint32_t
> +get_raw_dp_dequeue_count(void *user_data __rte_unused)
> +{
> + return 1;
Why is this 1 always? There could be jobs >1 which are processed.

> +}
> +
> +static void
> +post_process_raw_dp_op(void *user_data,  uint32_t index __rte_unused,
> + uint8_t is_op_success)
> +{
> + struct rte_crypto_op *op = user_data;
> + op->status = is_op_success ? RTE_CRYPTO_OP_STATUS_SUCCESS :
> + RTE_CRYPTO_OP_STATUS_ERROR;
> +}
> +
> +void
> +process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
> + struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth,
> + uint8_t len_in_bits, uint8_t cipher_iv_len)
> +{
> + struct rte_crypto_sym_op *sop = op->sym;
> + struct rte_crypto_op *ret_op = NULL;
> + struct rte_crypto_vec data_vec[UINT8_MAX];
> + struct rte_crypto_va_iova_ptr cipher_iv, digest, aad_auth_iv;
> + union rte_crypto_sym_ofs ofs;
> + struct rte_crypto_sym_vec vec;
> + struct rte_crypto_sgl sgl;
> + uint32_t max_len;
> + union rte_cryptodev_session_ctx sess;
> + uint32_t count = 0;
> + struct rte_crypto_raw_dp_ctx *ctx;
> + uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0,
> + auth_len = 0;
> + int32_t n;
> + uint32_t n_success;
> + int ctx_service_size;
> + int32_t status = 0;
> +
> + ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id);
> + if (ctx_service_size < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + ctx = malloc(ctx_service_size);
> + if (!ctx) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + /* Both are enums, setting crypto_sess will suit any session type */
> + sess.crypto_sess = op->sym->session;
> +
> + if (rte_cryptodev_configure_raw_dp_ctx(dev_id, qp_id, ctx,
> + op->sess_type, sess, 0) < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + goto exit;
> + }
> +
> + cipher_iv.iova = 0;
> + cipher_iv.va = NULL;
> + aad_auth_iv.iova = 0;
> + aad_auth_iv.va = NULL;
> + digest.iova = 0;
> + digest.va = NULL;
> + sgl.vec = data_vec;
> + vec.num = 1;
> + vec.sgl = &sgl;
> + vec.iv = &cipher_iv;
> + vec.digest = &digest;
> + vec.aad = &aad_auth_iv;
> + vec.status = &status;
> +
> + ofs.raw = 0;
> +
> + if (is_cipher && is_auth) {
> + cipher_offset = sop->cipher.data.offset;
> + cipher_len = sop->cipher.data.length;
> + auth_offset = sop->auth.data.offset;
> + auth_len = sop->auth.data.length;
> + max_len = RTE_MAX(cipher_offset + cipher_len,
> + auth_offset + auth_len);
> + if (len_in_bits) {
> + max_len = max_len >> 3;
> + cipher_offset = cipher_offset >> 3;
> + auth_offset = auth_offset >> 3;
> + cipher_len = cipher_len >> 3;
> + auth_len = auth_len >> 3;
> + }
> + ofs.ofs.cipher.head = cipher_offset;
> + ofs.ofs.cipher.tail = max_len - cipher_offset - cipher_len;
> + ofs.ofs.auth.head = auth_offset;
> + ofs.ofs.auth.tail = max_len - auth_offset - auth_len;
> + cipher_iv.va = rte_crypto_op_ctod_offset(op

Re: [dpdk-dev] [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw API test

2020-10-10 Thread Akhil Goyal
Hi Fan,

OK got your point. Review the changes in the documentation of APIs that I 
suggested in the 2/4 patch according to the below explanation.

Regards,
Akhil

From: Zhang, Roy Fan 
Sent: Sunday, October 11, 2020 2:21 AM
To: Akhil Goyal ; dev@dpdk.org
Cc: Dybkowski, AdamX 
Subject: Re: [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw 
API test

Hi Akhil,

For your “always return 1” question:

The way dequeue_burst API works is we may not know how many ops to dequeue 
without parsing the first user data (e.g. a structure containing n_ops data). 
It is up to the user to provide a callback function to return the number of ops 
- either by parsing the data structure, or a constant number - so in our unit 
test it is always 1 op to process. So returning 1.

For 2nd and 3rd question: enqueue_burst and dequeue_burst have to return 2 
values, the number of ops enqueued or stored but not enqueued, and the 
operation status (enqueued/stored by not enqueued/error). The changed API will 
return the number of ops enqueued/dequeued, that’s why I made the check here. 
The operation status (0/1/error code) is stored in the “status” field by the 
driver. This is explained in the header file comments

“

+ * @return

+ *   - The number of descriptors successfully enqueued.

+ *   - Possible enqueue status written by the driver:

+ * - 1: The descriptors are enqueued successfully.

+ * - 0: The descriptors are stored into device queue but are not processed

+ *  until rte_cryptodev_raw_enqueue_done() is called.

+ * - negative integer: failure.
If you think it is not clear, any suggestions?

Regards,
Fan



发件人: Akhil Goyal mailto:akhil.go...@nxp.com>>
发送时间: Saturday, October 10, 2020 8:55:07 PM
收件人: Zhang, Roy Fan mailto:roy.fan.zh...@intel.com>>; 
dev@dpdk.org mailto:dev@dpdk.org>>
抄送: Dybkowski, AdamX 
mailto:adamx.dybkow...@intel.com>>
主题: RE: [dpdk-dev v11 4/4] test/crypto: add unit-test for cryptodev raw API test

Hi Fan,

> +static uint32_t
> +get_raw_dp_dequeue_count(void *user_data __rte_unused)
> +{
> + return 1;
Why is this 1 always? There could be jobs >1 which are processed.

> +}
> +
> +static void
> +post_process_raw_dp_op(void *user_data,  uint32_t index __rte_unused,
> + uint8_t is_op_success)
> +{
> + struct rte_crypto_op *op = user_data;
> + op->status = is_op_success ? RTE_CRYPTO_OP_STATUS_SUCCESS :
> + RTE_CRYPTO_OP_STATUS_ERROR;
> +}
> +
> +void
> +process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
> + struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth,
> + uint8_t len_in_bits, uint8_t cipher_iv_len)
> +{
> + struct rte_crypto_sym_op *sop = op->sym;
> + struct rte_crypto_op *ret_op = NULL;
> + struct rte_crypto_vec data_vec[UINT8_MAX];
> + struct rte_crypto_va_iova_ptr cipher_iv, digest, aad_auth_iv;
> + union rte_crypto_sym_ofs ofs;
> + struct rte_crypto_sym_vec vec;
> + struct rte_crypto_sgl sgl;
> + uint32_t max_len;
> + union rte_cryptodev_session_ctx sess;
> + uint32_t count = 0;
> + struct rte_crypto_raw_dp_ctx *ctx;
> + uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0,
> + auth_len = 0;
> + int32_t n;
> + uint32_t n_success;
> + int ctx_service_size;
> + int32_t status = 0;
> +
> + ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id);
> + if (ctx_service_size < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + ctx = malloc(ctx_service_size);
> + if (!ctx) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + return;
> + }
> +
> + /* Both are enums, setting crypto_sess will suit any session type */
> + sess.crypto_sess = op->sym->session;
> +
> + if (rte_cryptodev_configure_raw_dp_ctx(dev_id, qp_id, ctx,
> + op->sess_type, sess, 0) < 0) {
> + op->status = RTE_CRYPTO_OP_STATUS_ERROR;
> + goto exit;
> + }
> +
> + cipher_iv.iova = 0;
> + cipher_iv.va = NULL;
> + aad_auth_iv.iova = 0;
> + aad_auth_iv.va = NULL;
> + digest.iova = 0;
> + digest.va = NULL;
> + sgl.vec = data_vec;
> + vec.num = 1;
> + vec.sgl = &sgl;
> + vec.iv = &cipher_iv;
> + vec.digest = &digest;
> + vec.aad = &aad_auth_iv;
> + vec.status = &status;
> +
> + ofs.raw = 0;
> +
> + if (is_cipher && is_auth) {
> + cipher_offset = sop->cipher.data.offset;
> + cipher_len = sop->cipher.data.length;
> + auth_offset = sop->auth.data.offset;
> + auth_len = sop->auth.data.length;
> + max_len = RTE_MAX(cipher_offset + cipher_len,
> + auth_offset + auth_len);
> + if (len_in_bits) {
> + max_len = max_len >> 3;
> + cipher_offse

Re: [dpdk-dev] [PATCH] security: update session create API

2020-10-10 Thread Akhil Goyal
Hi David,
> Hi Akhil
> 
> > -Original Message-
> > From: akhil.go...@nxp.com 
> > Sent: Thursday, September 3, 2020 9:10 PM
> 
> 
> 
> > diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index
> > 70bf6fe2c..6d7da1408 100644
> > --- a/app/test/test_cryptodev.c
> > +++ b/app/test/test_cryptodev.c
> > @@ -7219,7 +7219,8 @@ test_pdcp_proto(int i, int oop,
> >
> > /* Create security session */
> > ut_params->sec_session = rte_security_session_create(ctx,
> > -   &sess_conf, ts_params-
> > >session_priv_mpool);
> > +   &sess_conf, ts_params->session_mpool,
> > +   ts_params->session_priv_mpool);
> 
> [DC] ts_params->session_mpool is a cryptodev sym session pool. The
> assumption then in these security tests is that
> security sessions are smaller than cryptodev sym sessions. This is currently 
> true,
> but may not always be.
> 
> There should possibly be a new mempool created for security sessions.
> Or at least an assert somewhere to check a security session is smaller than a
> cryptodev sym session, so that this doesn't
> catch someone out in the future if security session grows in size.
> 
> The same comment applies to the crypto-perf-test and test_ipsec too

Fixed for test and crypto-perf. Test_ipsec is not exactly using a security 
session.
Fixing that is out of scope of this patch.

> 
> 
> 
> > diff --git a/app/test/test_security.c b/app/test/test_security.c index
> > 77fd5adc6..ed7de348f 100644
> > --- a/app/test/test_security.c
> > +++ b/app/test/test_security.c
> > @@ -237,6 +237,7 @@ static struct mock_session_create_data {
> > struct rte_security_session_conf *conf;
> > struct rte_security_session *sess;
> > struct rte_mempool *mp;
> > +   struct rte_mempool *priv_mp;
> >
> 
> 
> 
> > 790,7 +809,7 @@ test_session_create_inv_mempool(void)
> > struct rte_security_session *sess;
> >
> > sess = rte_security_session_create(&ut_params->ctx, &ut_params-
> > >conf,
> > -   NULL);
> > +   NULL, NULL);
> 
> [DC] This test test_session_create_inv_mempool() should have the priv_mp set
> to a valid
> value (i.e. ts_params->session_priv_mpool), and a new test function should be
> added where
> mp is valid, but priv_mp is NULL - this way we test for validity of both 
> mempools
> independently.

I would say that would be an overkill with not much gain.
Both mempool should be created before session is created. That is quite 
obvious. Isn't it?

> 
> 
> 
> > a/doc/guides/prog_guide/rte_security.rst
> > b/doc/guides/prog_guide/rte_security.rst
> > index 127da2e4f..cff0653f5 100644
> > --- a/doc/guides/prog_guide/rte_security.rst
> > +++ b/doc/guides/prog_guide/rte_security.rst
> > @@ -533,8 +533,10 @@ and this allows further acceleration of the offload of
> > Crypto workloads.
> >
> >  The Security framework provides APIs to create and free sessions for
> > crypto/ethernet  devices, where sessions are mempool objects. It is the
> > application's responsibility -to create and manage the session mempools. The
> > mempool object size should be able to -accommodate the driver's private
> > data of security session.
> > +to create and manage two session mempools - one for session and other
> > +for session private data. The mempool object size should be able to
> > +accommodate the driver's private data of security session. The
> > +application can get the size of session private data using API
> > ``rte_security_session_get_size``.
> 
> [DC] This sentence should be updated to specify it's the private session data
> mempool that is being referred to
> 
> "The mempool object size should be able to accommodate the driver's private
> data of security session."
> =>
> "The private session data mempool object size should be able to accommodate
> the driver's private data of security
> session."
> 
> Also, a sentence about the required size of the session mempool should also be
> added.

Fixed in v2

> 
> 
> 
> > diff --git a/doc/guides/rel_notes/release_20_11.rst
> > b/doc/guides/rel_notes/release_20_11.rst
> > index df227a177..04c1a1b81 100644
> > --- a/doc/guides/rel_notes/release_20_11.rst
> > +++ b/doc/guides/rel_notes/release_20_11.rst
> > @@ -84,6 +84,12 @@ API Changes
> > Also, make sure to start the actual text at the margin.
> > ===
> >
> > +* security: The API ``rte_security_session_create`` is updated to take
> > +two
> > +  mempool objects one for session and other for session private data.
> > +  So the application need to create two mempools and get the size of
> > +session
> > +  private data using API ``rte_security_session_get_size`` for private
> > +session
> > +  mempool.
> > +
> 
> [DC]  Many of the PMDs which support security don't implement the
> session_get_size
> callback. There's probably a job here for each PMD owner to add support for 
> this
> callback.
> 
If a PMD is supporting rte_se

Re: [dpdk-dev] [PATCH] security: update session create API

2020-10-10 Thread Akhil Goyal
Hi Lukasz,

Thanks for the review.

> > diff --git a/app/test/test_security.c b/app/test/test_security.c
> > index 77fd5adc6..ed7de348f 100644
> > --- a/app/test/test_security.c
> > +++ b/app/test/test_security.c
> > @@ -237,6 +237,7 @@ static struct mock_session_create_data {
> > struct rte_security_session_conf *conf;
> > struct rte_security_session *sess;
> > struct rte_mempool *mp;
> > +   struct rte_mempool *priv_mp;
> >
> > int ret;
> >
> session_create op is now called with private mbuf, so you need also to
> update assert in mock session_create:

OK will be fixed in v2

> 
> @@ -248,13 +249,13 @@ static int
>   mock_session_create(void *device,
>      struct rte_security_session_conf *conf,
>      struct rte_security_session *sess,
> -   struct rte_mempool *mp)
> +   struct rte_mempool *priv_mp)
>   {
>      mock_session_create_exp.called++;
> 
>      MOCK_TEST_ASSERT_POINTER_PARAMETER(mock_session_create_exp,
> device);
>      MOCK_TEST_ASSERT_POINTER_PARAMETER(mock_session_create_exp,
> conf);
> -   MOCK_TEST_ASSERT_POINTER_PARAMETER(mock_session_create_exp,
> mp);
> +   MOCK_TEST_ASSERT_POINTER_PARAMETER(mock_session_create_exp,
> priv_mp);
> 
>      mock_session_create_exp.sess = sess;
> 
> 
> 
> > @@ -502,6 +503,7 @@ struct rte_security_ops mock_ops = {
> >*/
> >   static struct security_testsuite_params {
> > struct rte_mempool *session_mpool;
> > +   struct rte_mempool *session_priv_mpool;
> >   } testsuite_params = { NULL };
> >
> >   /**
> > @@ -525,6 +527,7 @@ static struct security_unittest_params {
> >   };
> >
> >   #define SECURITY_TEST_MEMPOOL_NAME "SecurityTestsMempoolName"
> > +#define SECURITY_TEST_PRIV_MEMPOOL_NAME
> "SecurityTestsPrivMempoolName"
> Please make the mempool name shorter, otherwise it causes tests to fail:
> 
> EAL: Test assert testsuite_setup line 558 failed: Cannot create priv
> mempool File name too long
> 

Fixed in v2

> >   #define SECURITY_TEST_MEMPOOL_SIZE 15
> >   #define SECURITY_TEST_SESSION_OBJECT_SIZE sizeof(struct
> rte_security_session)
> >
> > @@ -545,6 +548,17 @@ testsuite_setup(void)
> > SOCKET_ID_ANY, 0);
> > TEST_ASSERT_NOT_NULL(ts_params->session_mpool,
> > "Cannot create mempool %s\n",
> rte_strerror(rte_errno));
> > +
> > +   ts_params->session_priv_mpool = rte_mempool_create(
> > +   SECURITY_TEST_PRIV_MEMPOOL_NAME,
> > +   SECURITY_TEST_MEMPOOL_SIZE,
> > +   rte_security_session_get_size(&unittest_params.ctx),
> > +   0, 0, NULL, NULL, NULL, NULL,
> > +   SOCKET_ID_ANY, 0);
> > +   TEST_ASSERT_NOT_NULL(ts_params->session_priv_mpool,
> > +   "Cannot create priv mempool %s\n",
> > +   rte_strerror(rte_errno));
> > +
> If creation of private data mpool fails, primary mempool need to be
> freed before function returns failure code.
This is an issue in whole of the file.
However, have fixed it in v2 for this particular case in v2.



[dpdk-dev] [PATCH v2] security: update session create API

2020-10-10 Thread Akhil Goyal
The API ``rte_security_session_create`` takes only single
mempool for session and session private data. So the
application need to create mempool for twice the number of
sessions needed and will also lead to wastage of memory as
session private data need more memory compared to session.
Hence the API is modified to take two mempool pointers
- one for session and one for private data.
This is very similar to crypto based session create APIs.

Signed-off-by: Akhil Goyal 
---

Changes in V2:
incorporated comments from Lukasz and David.

 app/test-crypto-perf/cperf_ops.c   |  4 +-
 app/test-crypto-perf/main.c| 12 +++--
 app/test/test_cryptodev.c  | 18 ++--
 app/test/test_ipsec.c  |  3 +-
 app/test/test_security.c   | 61 --
 doc/guides/prog_guide/rte_security.rst |  8 +++-
 doc/guides/rel_notes/deprecation.rst   |  7 ---
 doc/guides/rel_notes/release_20_11.rst |  6 +++
 examples/ipsec-secgw/ipsec-secgw.c | 12 +
 examples/ipsec-secgw/ipsec.c   |  9 ++--
 lib/librte_security/rte_security.c |  7 ++-
 lib/librte_security/rte_security.h |  4 +-
 12 files changed, 102 insertions(+), 49 deletions(-)

diff --git a/app/test-crypto-perf/cperf_ops.c b/app/test-crypto-perf/cperf_ops.c
index 3da835a9c..3a64a2c34 100644
--- a/app/test-crypto-perf/cperf_ops.c
+++ b/app/test-crypto-perf/cperf_ops.c
@@ -621,7 +621,7 @@ cperf_create_session(struct rte_mempool *sess_mp,
 
/* Create security session */
return (void *)rte_security_session_create(ctx,
-   &sess_conf, sess_mp);
+   &sess_conf, sess_mp, priv_mp);
}
if (options->op_type == CPERF_DOCSIS) {
enum rte_security_docsis_direction direction;
@@ -664,7 +664,7 @@ cperf_create_session(struct rte_mempool *sess_mp,
 
/* Create security session */
return (void *)rte_security_session_create(ctx,
-   &sess_conf, priv_mp);
+   &sess_conf, sess_mp, priv_mp);
}
 #endif
sess = rte_cryptodev_sym_session_create(sess_mp);
diff --git a/app/test-crypto-perf/main.c b/app/test-crypto-perf/main.c
index 62ae6048b..53864ffdd 100644
--- a/app/test-crypto-perf/main.c
+++ b/app/test-crypto-perf/main.c
@@ -156,7 +156,14 @@ cperf_initialize_cryptodev(struct cperf_options *opts, 
uint8_t *enabled_cdevs)
if (sess_size > max_sess_size)
max_sess_size = sess_size;
}
-
+#ifdef RTE_LIBRTE_SECURITY
+   for (cdev_id = 0; cdev_id < rte_cryptodev_count(); cdev_id++) {
+   sess_size = rte_security_session_get_size(
+   rte_cryptodev_get_sec_ctx(cdev_id));
+   if (sess_size > max_sess_size)
+   max_sess_size = sess_size;
+   }
+#endif
/*
 * Calculate number of needed queue pairs, based on the amount
 * of available number of logical cores and crypto devices.
@@ -247,8 +254,7 @@ cperf_initialize_cryptodev(struct cperf_options *opts, 
uint8_t *enabled_cdevs)
opts->nb_qps * nb_slaves;
 #endif
} else
-   sessions_needed = enabled_cdev_count *
-   opts->nb_qps * 2;
+   sessions_needed = enabled_cdev_count * opts->nb_qps;
 
/*
 * A single session is required per queue pair
diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index ac2a36bc2..4bd9d8aff 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -553,9 +553,15 @@ testsuite_setup(void)
unsigned int session_size =
rte_cryptodev_sym_get_private_session_size(dev_id);
 
+#ifdef RTE_LIBRTE_SECURITY
+   unsigned int security_session_size = rte_security_session_get_size(
+   rte_cryptodev_get_sec_ctx(dev_id));
+
+   if (session_size < security_session_size)
+   session_size = security_session_size;
+#endif
/*
-* Create mempool with maximum number of sessions * 2,
-* to include the session headers
+* Create mempool with maximum number of sessions.
 */
if (info.sym.max_nb_sessions != 0 &&
info.sym.max_nb_sessions < MAX_NB_SESSIONS) {
@@ -7219,7 +7225,8 @@ test_pdcp_proto(int i, int oop,
 
/* Create security session */
ut_params->sec_session = rte_security_session_create(ctx,
-   &sess_conf, ts_params->session_priv_mpool);
+   &sess_conf, ts_params->session_mpool,
+   ts_params->session_priv_mpool);
 
if (!ut_params->sec_session) {
printf("TestCase %s()-%d line %d failed %s: ",
@@ -7479,7 +7486,8 @@ test_pdcp_p

[dpdk-dev] [dpdk-dev v12 0/4] cryptodev: add raw data-path APIs

2020-10-10 Thread Fan Zhang
The Crypto Raw data-path APIs are a set of APIs designed to enable external
libraries/applications to leverage the cryptographic processing provided by
DPDK crypto PMDs through the cryptodev API but in a manner that is not
dependent on native DPDK data structures (eg. rte_mbuf, rte_crypto_op, ... etc)
in their data-path implementation.

The raw data-path APIs have the following advantages:
- External data structure friendly design. The new APIs uses the operation
  descriptor ``struct rte_crypto_sym_vec`` that supports raw data pointer and
  IOVA addresses as input. Moreover, the APIs does not require the user to
  allocate the descriptor from mempool, nor requiring mbufs to describe input
  data's virtual and IOVA addresses. All these features made the translation
  from user's own data structure into the descriptor easier and more efficient.
- Flexible enqueue and dequeue operation. The raw data-path APIs gives the
  user more control to the enqueue and dequeue operations, including the
  capability of precious enqueue/dequeue count, abandoning enqueue or dequeue
  at any time, and operation status translation and set on the fly.

v12:
- Fixed and updated documentation.
- Fixed typo.

v11:
- Rebased on top of latest master.
- API changed followed by the discussion results.
- Fixed a few grammar error thanks to Akhil.
- Reverted attach session API changes.
- Fixed QAT driver bugs.

v10:
- Changed rte_crypto_sym_vec structure to support both sync cpu_crypto and
  async raw data-path API.
- Changed documentation.
- Changed API names.
- Changed the way data-path context is initialized.
- Added new API to attach session or xform to existing context.
- Changed QAT PMD accordingly with new APIs.
- Changed unit test to use the device feature flag for the raw API tests.

v9:
- Changed return types of submit_done() and dequeue_done() APIs.
- Added release note update. 

v8:
- Updated following by comments.
- Fixed a few bugs.
- Fixed ARM build error.
- Updated the unit test covering all tests.

v7:
- Fixed a few typos.
- Fixed length calculation bugs.

v6:
- Rebased on top of DPDK 20.08.
- Changed to service ctx and added single job submit/dequeue.

v5:
- Changed to use rte_crypto_sym_vec as input.
- Changed to use public APIs instead of use function pointer.

v4:
- Added missed patch.

v3:
- Instead of QAT only API, moved the API to cryptodev.
- Added cryptodev feature flags.

v2:
- Used a structure to simplify parameters.
- Added unit tests.
- Added documentation.

Fan Zhang (4):
  cryptodev: change crypto symmetric vector structure
  cryptodev: add raw crypto data-path APIs
  crypto/qat: add raw crypto data-path API support
  test/crypto: add unit-test for cryptodev raw API test

 app/test/test_cryptodev.c | 812 ++-
 app/test/test_cryptodev.h |  12 +
 app/test/test_cryptodev_blockcipher.c |  58 +-
 doc/guides/cryptodevs/features/default.ini|   1 +
 doc/guides/cryptodevs/features/qat.ini|   1 +
 doc/guides/prog_guide/cryptodev_lib.rst   | 109 +-
 doc/guides/rel_notes/release_20_11.rst|  14 +
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c  |  18 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c|   9 +-
 drivers/crypto/qat/meson.build|   1 +
 drivers/crypto/qat/qat_sym.h  |  11 +
 drivers/crypto/qat/qat_sym_hw_dp.c| 959 ++
 drivers/crypto/qat/qat_sym_pmd.c  |   9 +-
 lib/librte_cryptodev/rte_crypto_sym.h |  40 +-
 lib/librte_cryptodev/rte_cryptodev.c  |  80 ++
 lib/librte_cryptodev/rte_cryptodev.h  | 413 +++-
 lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 +-
 .../rte_cryptodev_version.map |  10 +
 lib/librte_ipsec/esp_inb.c|  12 +-
 lib/librte_ipsec/esp_outb.c   |  12 +-
 lib/librte_ipsec/misc.h   |   6 +-
 21 files changed, 2530 insertions(+), 108 deletions(-)
 create mode 100644 drivers/crypto/qat/qat_sym_hw_dp.c

-- 
2.20.1



[dpdk-dev] [dpdk-dev v12 1/4] cryptodev: change crypto symmetric vector structure

2020-10-10 Thread Fan Zhang
This patch updates ``rte_crypto_sym_vec`` structure to add
support for both cpu_crypto synchrounous operation and
asynchronous raw data-path APIs. The patch also includes
AESNI-MB and AESNI-GCM PMD changes, unit test changes and
documentation updates.

Signed-off-by: Fan Zhang 
Acked-by: Adam Dybkowski 
Acked-by: Konstantin Ananyev 
---
 app/test/test_cryptodev.c  | 25 --
 doc/guides/prog_guide/cryptodev_lib.rst|  3 +-
 doc/guides/rel_notes/release_20_11.rst |  3 ++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c   | 18 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c |  9 +++--
 lib/librte_cryptodev/rte_crypto_sym.h  | 40 --
 lib/librte_ipsec/esp_inb.c | 12 +++
 lib/librte_ipsec/esp_outb.c| 12 +++
 lib/librte_ipsec/misc.h|  6 ++--
 9 files changed, 79 insertions(+), 49 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index ac2a36bc2..62a265520 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -151,11 +151,11 @@ static void
 process_cpu_aead_op(uint8_t dev_id, struct rte_crypto_op *op)
 {
int32_t n, st;
-   void *iv;
struct rte_crypto_sym_op *sop;
union rte_crypto_sym_ofs ofs;
struct rte_crypto_sgl sgl;
struct rte_crypto_sym_vec symvec;
+   struct rte_crypto_va_iova_ptr iv_ptr, aad_ptr, digest_ptr;
struct rte_crypto_vec vec[UINT8_MAX];
 
sop = op->sym;
@@ -171,13 +171,17 @@ process_cpu_aead_op(uint8_t dev_id, struct rte_crypto_op 
*op)
sgl.vec = vec;
sgl.num = n;
symvec.sgl = &sgl;
-   iv = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
-   symvec.iv = &iv;
-   symvec.aad = (void **)&sop->aead.aad.data;
-   symvec.digest = (void **)&sop->aead.digest.data;
+   symvec.iv = &iv_ptr;
+   symvec.digest = &digest_ptr;
+   symvec.aad = &aad_ptr;
symvec.status = &st;
symvec.num = 1;
 
+   /* for CPU crypto the IOVA address is not required */
+   iv_ptr.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   digest_ptr.va = (void *)sop->aead.digest.data;
+   aad_ptr.va = (void *)sop->aead.aad.data;
+
ofs.raw = 0;
 
n = rte_cryptodev_sym_cpu_crypto_process(dev_id, sop->session, ofs,
@@ -193,11 +197,11 @@ static void
 process_cpu_crypt_auth_op(uint8_t dev_id, struct rte_crypto_op *op)
 {
int32_t n, st;
-   void *iv;
struct rte_crypto_sym_op *sop;
union rte_crypto_sym_ofs ofs;
struct rte_crypto_sgl sgl;
struct rte_crypto_sym_vec symvec;
+   struct rte_crypto_va_iova_ptr iv_ptr, digest_ptr;
struct rte_crypto_vec vec[UINT8_MAX];
 
sop = op->sym;
@@ -213,13 +217,14 @@ process_cpu_crypt_auth_op(uint8_t dev_id, struct 
rte_crypto_op *op)
sgl.vec = vec;
sgl.num = n;
symvec.sgl = &sgl;
-   iv = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
-   symvec.iv = &iv;
-   symvec.aad = (void **)&sop->aead.aad.data;
-   symvec.digest = (void **)&sop->auth.digest.data;
+   symvec.iv = &iv_ptr;
+   symvec.digest = &digest_ptr;
symvec.status = &st;
symvec.num = 1;
 
+   iv_ptr.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   digest_ptr.va = (void *)sop->auth.digest.data;
+
ofs.raw = 0;
ofs.ofs.cipher.head = sop->cipher.data.offset - sop->auth.data.offset;
ofs.ofs.cipher.tail = (sop->auth.data.offset + sop->auth.data.length) -
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index c14f750fa..e7ba35c2d 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -620,7 +620,8 @@ operation descriptor (``struct rte_crypto_sym_vec``) 
containing:
   descriptors of performed operations (``struct rte_crypto_sgl``). Each 
instance
   of ``struct rte_crypto_sgl`` consists of a number of segments and a pointer 
to
   an array of segment descriptors ``struct rte_crypto_vec``;
-- pointers to arrays of size ``num`` containing IV, AAD and digest information,
+- pointers to arrays of size ``num`` containing IV, AAD and digest information
+  in the ``cpu_crypto`` sub-structure,
 - pointer to an array of size ``num`` where status information will be stored
   for each operation.
 
diff --git a/doc/guides/rel_notes/release_20_11.rst 
b/doc/guides/rel_notes/release_20_11.rst
index 8b911488c..2973b2a33 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -302,6 +302,9 @@ API Changes
   ``rte_fpga_lte_fec_configure`` and structure ``fpga_lte_fec_conf`` to
   ``rte_fpga_lte_fec_conf``.
 
+* The structure ``rte_crypto_sym_vec`` is updated to support both
+  cpu_crypto synchrounous operation and asynchronous raw data-path APIs.
+
 
 ABI Changes
 ---
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c 
b/drive

[dpdk-dev] [dpdk-dev v12 3/4] crypto/qat: add raw crypto data-path API support

2020-10-10 Thread Fan Zhang
This patch updates QAT PMD to add raw data-path API support.

Signed-off-by: Fan Zhang 
Acked-by: Adam Dybkowski 
---
 doc/guides/cryptodevs/features/qat.ini |   1 +
 doc/guides/rel_notes/release_20_11.rst |   4 +
 drivers/crypto/qat/meson.build |   1 +
 drivers/crypto/qat/qat_sym.h   |  11 +
 drivers/crypto/qat/qat_sym_hw_dp.c | 959 +
 drivers/crypto/qat/qat_sym_pmd.c   |   9 +-
 6 files changed, 983 insertions(+), 2 deletions(-)
 create mode 100644 drivers/crypto/qat/qat_sym_hw_dp.c

diff --git a/doc/guides/cryptodevs/features/qat.ini 
b/doc/guides/cryptodevs/features/qat.ini
index 9e82f2886..6cc09cde7 100644
--- a/doc/guides/cryptodevs/features/qat.ini
+++ b/doc/guides/cryptodevs/features/qat.ini
@@ -17,6 +17,7 @@ Digest encrypted   = Y
 Asymmetric sessionless = Y
 RSA PRIV OP KEY EXP= Y
 RSA PRIV OP KEY QT = Y
+Sym raw data path API  = Y
 
 ;
 ; Supported crypto algorithms of the 'qat' crypto driver.
diff --git a/doc/guides/rel_notes/release_20_11.rst 
b/doc/guides/rel_notes/release_20_11.rst
index 85a07d86e..008f4eedc 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -104,6 +104,10 @@ New Features
   * Added support for non-HMAC auth algorithms
 (MD5, SHA1, SHA224, SHA256, SHA384, SHA512).
 
+* **Update QAT crypto PMD.**
+
+  * Added Raw Data-path APIs support.
+
 * **Added Intel ACC100 bbdev PMD.**
 
   Added a new ``acc100`` bbdev driver for the Intel\ |reg| ACC100 accelerator
diff --git a/drivers/crypto/qat/meson.build b/drivers/crypto/qat/meson.build
index a225f374a..bc90ec44c 100644
--- a/drivers/crypto/qat/meson.build
+++ b/drivers/crypto/qat/meson.build
@@ -15,6 +15,7 @@ if dep.found()
qat_sources += files('qat_sym_pmd.c',
 'qat_sym.c',
 'qat_sym_session.c',
+'qat_sym_hw_dp.c',
 'qat_asym_pmd.c',
 'qat_asym.c')
qat_ext_deps += dep
diff --git a/drivers/crypto/qat/qat_sym.h b/drivers/crypto/qat/qat_sym.h
index 1a9748849..7254f5e3c 100644
--- a/drivers/crypto/qat/qat_sym.h
+++ b/drivers/crypto/qat/qat_sym.h
@@ -264,6 +264,16 @@ qat_sym_process_response(void **op, uint8_t *resp)
}
*op = (void *)rx_op;
 }
+
+int
+qat_sym_configure_dp_ctx(struct rte_cryptodev *dev, uint16_t qp_id,
+   struct rte_crypto_raw_dp_ctx *raw_dp_ctx,
+   enum rte_crypto_op_sess_type sess_type,
+   union rte_cryptodev_session_ctx session_ctx, uint8_t is_update);
+
+int
+qat_sym_get_dp_ctx_size(struct rte_cryptodev *dev);
+
 #else
 
 static inline void
@@ -276,5 +286,6 @@ static inline void
 qat_sym_process_response(void **op __rte_unused, uint8_t *resp __rte_unused)
 {
 }
+
 #endif
 #endif /* _QAT_SYM_H_ */
diff --git a/drivers/crypto/qat/qat_sym_hw_dp.c 
b/drivers/crypto/qat/qat_sym_hw_dp.c
new file mode 100644
index 0..dfbbad59b
--- /dev/null
+++ b/drivers/crypto/qat/qat_sym_hw_dp.c
@@ -0,0 +1,959 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include 
+
+#include "adf_transport_access_macros.h"
+#include "icp_qat_fw.h"
+#include "icp_qat_fw_la.h"
+
+#include "qat_sym.h"
+#include "qat_sym_pmd.h"
+#include "qat_sym_session.h"
+#include "qat_qp.h"
+
+struct qat_sym_dp_ctx {
+   struct qat_sym_session *session;
+   uint32_t tail;
+   uint32_t head;
+   uint16_t cached_enqueue;
+   uint16_t cached_dequeue;
+};
+
+static __rte_always_inline int32_t
+qat_sym_dp_parse_data_vec(struct qat_qp *qp, struct icp_qat_fw_la_bulk_req 
*req,
+   struct rte_crypto_vec *data, uint16_t n_data_vecs)
+{
+   struct qat_queue *tx_queue;
+   struct qat_sym_op_cookie *cookie;
+   struct qat_sgl *list;
+   uint32_t i;
+   uint32_t total_len;
+
+   if (likely(n_data_vecs == 1)) {
+   req->comn_mid.src_data_addr = req->comn_mid.dest_data_addr =
+   data[0].iova;
+   req->comn_mid.src_length = req->comn_mid.dst_length =
+   data[0].len;
+   return data[0].len;
+   }
+
+   if (n_data_vecs == 0 || n_data_vecs > QAT_SYM_SGL_MAX_NUMBER)
+   return -1;
+
+   total_len = 0;
+   tx_queue = &qp->tx_q;
+
+   ICP_QAT_FW_COMN_PTR_TYPE_SET(req->comn_hdr.comn_req_flags,
+   QAT_COMN_PTR_TYPE_SGL);
+   cookie = qp->op_cookies[tx_queue->tail >> tx_queue->trailz];
+   list = (struct qat_sgl *)&cookie->qat_sgl_src;
+
+   for (i = 0; i < n_data_vecs; i++) {
+   list->buffers[i].len = data[i].len;
+   list->buffers[i].resrvd = 0;
+   list->buffers[i].addr = data[i].iova;
+   if (total_len + data[i].len > UINT32_MAX) {
+   QAT_DP_LOG(ERR, "Message too long");
+   return -1;
+   }
+   total_len += data[i].len;
+   }
+
+ 

[dpdk-dev] [dpdk-dev v12 2/4] cryptodev: add raw crypto data-path APIs

2020-10-10 Thread Fan Zhang
This patch adds raw data-path APIs for enqueue and dequeue
operations to cryptodev. The APIs support flexible user-define
enqueue and dequeue behaviors.

Signed-off-by: Fan Zhang 
Signed-off-by: Piotr Bronowski 
Acked-by: Adam Dybkowski 
---
 doc/guides/cryptodevs/features/default.ini|   1 +
 doc/guides/prog_guide/cryptodev_lib.rst   | 106 +
 doc/guides/rel_notes/release_20_11.rst|   7 +
 lib/librte_cryptodev/rte_cryptodev.c  |  80 
 lib/librte_cryptodev/rte_cryptodev.h  | 413 +-
 lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 ++-
 .../rte_cryptodev_version.map |  10 +
 7 files changed, 665 insertions(+), 3 deletions(-)

diff --git a/doc/guides/cryptodevs/features/default.ini 
b/doc/guides/cryptodevs/features/default.ini
index 133a246ee..17b177fc4 100644
--- a/doc/guides/cryptodevs/features/default.ini
+++ b/doc/guides/cryptodevs/features/default.ini
@@ -30,6 +30,7 @@ Asymmetric sessionless =
 CPU crypto =
 Symmetric sessionless  =
 Non-Byte aligned data  =
+Sym raw data path API  =
 
 ;
 ; Supported crypto algorithms of a default crypto driver.
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index e7ba35c2d..0c018b982 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -632,6 +632,112 @@ a call argument. Status different than zero must be 
treated as error.
 For more details, e.g. how to convert an mbuf to an SGL, please refer to an
 example usage in the IPsec library implementation.
 
+Cryptodev Raw Data-path APIs
+
+
+The Crypto Raw data-path APIs are a set of APIs designed to enable external
+libraries/applications to leverage the cryptographic processing provided by
+DPDK crypto PMDs through the cryptodev API but in a manner that is not
+dependent on native DPDK data structures (eg. rte_mbuf, rte_crypto_op, ... etc)
+in their data-path implementation.
+
+The raw data-path APIs have the following advantages:
+- External data structure friendly design. The new APIs uses the operation
+  descriptor ``struct rte_crypto_sym_vec`` that supports raw data pointer and
+  IOVA addresses as input. Moreover, the APIs does not require the user to
+  allocate the descriptor from mempool, nor requiring mbufs to describe input
+  data's virtual and IOVA addresses. All these features made the translation
+  from user's own data structure into the descriptor easier and more efficient.
+- Flexible enqueue and dequeue operation. The raw data-path APIs gives the
+  user more control to the enqueue and dequeue operations, including the
+  capability of precious enqueue/dequeue count, abandoning enqueue or dequeue
+  at any time, and operation status translation and set on the fly.
+
+Cryptodev PMDs which support the raw data-path APIs will have
+``RTE_CRYPTODEV_FF_SYM_RAW_DP`` feature flag presented. To use this feature,
+the user shall create a local ``struct rte_crypto_raw_dp_ctx`` buffer and
+extend to at least the length returned by ``rte_cryptodev_get_raw_dp_ctx_size``
+function call. The created buffer is then initialized using
+``rte_cryptodev_configure_raw_dp_ctx`` function with the ``is_update``
+parameter as 0. The library and the crypto device driver will then set the
+buffer and attach either the cryptodev sym session, the rte_security session,
+or the cryptodev xform for session-less operation into the ctx buffer, and
+set the corresponding enqueue and dequeue function handlers based on the
+algorithm information stored in the session or xform. When the ``is_update``
+parameter passed into ``rte_cryptodev_configure_raw_dp_ctx`` is 1, the driver
+will not initialize the buffer but only update the session or xform and
+the function handlers accordingly.
+
+After the ``struct rte_crypto_raw_dp_ctx`` buffer is initialized, it is now
+ready for enqueue and dequeue operation. There are two different enqueue
+functions: ``rte_cryptodev_raw_enqueue`` to enqueue single raw data
+operation, and ``rte_cryptodev_raw_enqueue_burst`` to enqueue a descriptor
+with multiple operations. In case of the application uses similar approach to
+``struct rte_crypto_sym_vec`` to manage its data burst but with different
+data structure, using the ``rte_cryptodev_raw_enqueue_burst`` function may be
+less efficient as this is a situation where the application has to loop over
+all crypto operations to assemble the ``struct rte_crypto_sym_vec`` descriptor
+from its own data structure, and then the driver will loop over them again to
+translate every operation in the descriptor to the driver's specific queue 
data.
+The ``rte_cryptodev_raw_enqueue`` should be used to save one loop for each data
+burst instead.
+
+The ``rte_cryptodev_raw_enqueue`` and ``rte_cryptodev_raw_enqueue_burst``
+functions will return or set the enqueue status. ``rte_cryptodev_raw_enqueue``
+will return the status directly, ``rte_cryptodev_raw_enqueue_bur

[dpdk-dev] [dpdk-dev v12 4/4] test/crypto: add unit-test for cryptodev raw API test

2020-10-10 Thread Fan Zhang
This patch adds the cryptodev raw API test support to unit test.
In addition a new test-case for QAT PMD for the test type is
enabled.

Signed-off-by: Fan Zhang 
Acked-by: Adam Dybkowski 
---
 app/test/test_cryptodev.c | 787 --
 app/test/test_cryptodev.h |  12 +
 app/test/test_cryptodev_blockcipher.c |  58 +-
 3 files changed, 803 insertions(+), 54 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 62a265520..219373e10 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -49,6 +49,10 @@
 #define VDEV_ARGS_SIZE 100
 #define MAX_NB_SESSIONS 4
 
+#define MAX_DRV_SERVICE_CTX_SIZE 256
+
+#define MAX_RAW_DEQUEUE_COUNT  65535
+
 #define IN_PLACE 0
 #define OUT_OF_PLACE 1
 
@@ -57,6 +61,8 @@ static int gbl_driver_id;
 static enum rte_security_session_action_type gbl_action_type =
RTE_SECURITY_ACTION_TYPE_NONE;
 
+enum cryptodev_api_test_type global_api_test_type = CRYPTODEV_API_TEST;
+
 struct crypto_testsuite_params {
struct rte_mempool *mbuf_pool;
struct rte_mempool *large_mbuf_pool;
@@ -147,6 +153,215 @@ ceil_byte_length(uint32_t num_bits)
return (num_bits >> 3);
 }
 
+static uint32_t
+get_raw_dp_dequeue_count(void *user_data __rte_unused)
+{
+   return 1;
+}
+
+static void
+post_process_raw_dp_op(void *user_data,uint32_t index __rte_unused,
+   uint8_t is_op_success)
+{
+   struct rte_crypto_op *op = user_data;
+   op->status = is_op_success ? RTE_CRYPTO_OP_STATUS_SUCCESS :
+   RTE_CRYPTO_OP_STATUS_ERROR;
+}
+
+void
+process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
+   struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth,
+   uint8_t len_in_bits, uint8_t cipher_iv_len)
+{
+   struct rte_crypto_sym_op *sop = op->sym;
+   struct rte_crypto_op *ret_op = NULL;
+   struct rte_crypto_vec data_vec[UINT8_MAX];
+   struct rte_crypto_va_iova_ptr cipher_iv, digest, aad_auth_iv;
+   union rte_crypto_sym_ofs ofs;
+   struct rte_crypto_sym_vec vec;
+   struct rte_crypto_sgl sgl;
+   uint32_t max_len;
+   union rte_cryptodev_session_ctx sess;
+   uint32_t count = 0;
+   struct rte_crypto_raw_dp_ctx *ctx;
+   uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0,
+   auth_len = 0;
+   int32_t n;
+   uint32_t n_success;
+   int ctx_service_size;
+   int32_t status = 0;
+   int enqueue_status, dequeue_status;
+
+   ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id);
+   if (ctx_service_size < 0) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   return;
+   }
+
+   ctx = malloc(ctx_service_size);
+   if (!ctx) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   return;
+   }
+
+   /* Both are enums, setting crypto_sess will suit any session type */
+   sess.crypto_sess = op->sym->session;
+
+   if (rte_cryptodev_configure_raw_dp_ctx(dev_id, qp_id, ctx,
+   op->sess_type, sess, 0) < 0) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   goto exit;
+   }
+
+   cipher_iv.iova = 0;
+   cipher_iv.va = NULL;
+   aad_auth_iv.iova = 0;
+   aad_auth_iv.va = NULL;
+   digest.iova = 0;
+   digest.va = NULL;
+   sgl.vec = data_vec;
+   vec.num = 1;
+   vec.sgl = &sgl;
+   vec.iv = &cipher_iv;
+   vec.digest = &digest;
+   vec.aad = &aad_auth_iv;
+   vec.status = &status;
+
+   ofs.raw = 0;
+
+   if (is_cipher && is_auth) {
+   cipher_offset = sop->cipher.data.offset;
+   cipher_len = sop->cipher.data.length;
+   auth_offset = sop->auth.data.offset;
+   auth_len = sop->auth.data.length;
+   max_len = RTE_MAX(cipher_offset + cipher_len,
+   auth_offset + auth_len);
+   if (len_in_bits) {
+   max_len = max_len >> 3;
+   cipher_offset = cipher_offset >> 3;
+   auth_offset = auth_offset >> 3;
+   cipher_len = cipher_len >> 3;
+   auth_len = auth_len >> 3;
+   }
+   ofs.ofs.cipher.head = cipher_offset;
+   ofs.ofs.cipher.tail = max_len - cipher_offset - cipher_len;
+   ofs.ofs.auth.head = auth_offset;
+   ofs.ofs.auth.tail = max_len - auth_offset - auth_len;
+   cipher_iv.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   cipher_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET);
+   aad_auth_iv.va = rte_crypto_op_ctod_offset(
+   op, void *, IV_OFFSET + cipher_iv_len);
+   aad_auth_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET +
+   cipher_iv_len);
+  

[dpdk-dev] [dpdk-dev v13 0/4] cryptodev: add raw data-path APIs

2020-10-10 Thread Fan Zhang
The Crypto Raw data-path APIs are a set of APIs designed to enable external
libraries/applications to leverage the cryptographic processing provided by
DPDK crypto PMDs through the cryptodev API but in a manner that is not
dependent on native DPDK data structures (eg. rte_mbuf, rte_crypto_op, ... etc)
in their data-path implementation.

The raw data-path APIs have the following advantages:
- External data structure friendly design. The new APIs uses the operation
  descriptor ``struct rte_crypto_sym_vec`` that supports raw data pointer and
  IOVA addresses as input. Moreover, the APIs does not require the user to
  allocate the descriptor from mempool, nor requiring mbufs to describe input
  data's virtual and IOVA addresses. All these features made the translation
  from user's own data structure into the descriptor easier and more efficient.
- Flexible enqueue and dequeue operation. The raw data-path APIs gives the
  user more control to the enqueue and dequeue operations, including the
  capability of precious enqueue/dequeue count, abandoning enqueue or dequeue
  at any time, and operation status translation and set on the fly.

v13:
- Fixed a typo.

v12:
- Fixed and updated documentation.
- Fixed typo.

v11:
- Rebased on top of latest master.
- API changed followed by the discussion results.
- Fixed a few grammar error thanks to Akhil.
- Reverted attach session API changes.
- Fixed QAT driver bugs.

v10:
- Changed rte_crypto_sym_vec structure to support both sync cpu_crypto and
  async raw data-path API.
- Changed documentation.
- Changed API names.
- Changed the way data-path context is initialized.
- Added new API to attach session or xform to existing context.
- Changed QAT PMD accordingly with new APIs.
- Changed unit test to use the device feature flag for the raw API tests.

v9:
- Changed return types of submit_done() and dequeue_done() APIs.
- Added release note update. 

v8:
- Updated following by comments.
- Fixed a few bugs.
- Fixed ARM build error.
- Updated the unit test covering all tests.

v7:
- Fixed a few typos.
- Fixed length calculation bugs.

v6:
- Rebased on top of DPDK 20.08.
- Changed to service ctx and added single job submit/dequeue.

v5:
- Changed to use rte_crypto_sym_vec as input.
- Changed to use public APIs instead of use function pointer.

v4:
- Added missed patch.

v3:
- Instead of QAT only API, moved the API to cryptodev.
- Added cryptodev feature flags.

v2:
- Used a structure to simplify parameters.
- Added unit tests.
- Added documentation.

Fan Zhang (4):
  cryptodev: change crypto symmetric vector structure
  cryptodev: add raw crypto data-path APIs
  crypto/qat: add raw crypto data-path API support
  test/crypto: add unit-test for cryptodev raw API test

 app/test/test_cryptodev.c | 812 ++-
 app/test/test_cryptodev.h |  12 +
 app/test/test_cryptodev_blockcipher.c |  58 +-
 doc/guides/cryptodevs/features/default.ini|   1 +
 doc/guides/cryptodevs/features/qat.ini|   1 +
 doc/guides/prog_guide/cryptodev_lib.rst   | 109 +-
 doc/guides/rel_notes/release_20_11.rst|  14 +
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c  |  18 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c|   9 +-
 drivers/crypto/qat/meson.build|   1 +
 drivers/crypto/qat/qat_sym.h  |  11 +
 drivers/crypto/qat/qat_sym_hw_dp.c| 959 ++
 drivers/crypto/qat/qat_sym_pmd.c  |   9 +-
 lib/librte_cryptodev/rte_crypto_sym.h |  40 +-
 lib/librte_cryptodev/rte_cryptodev.c  |  80 ++
 lib/librte_cryptodev/rte_cryptodev.h  | 413 +++-
 lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 +-
 .../rte_cryptodev_version.map |  10 +
 lib/librte_ipsec/esp_inb.c|  12 +-
 lib/librte_ipsec/esp_outb.c   |  12 +-
 lib/librte_ipsec/misc.h   |   6 +-
 21 files changed, 2530 insertions(+), 108 deletions(-)
 create mode 100644 drivers/crypto/qat/qat_sym_hw_dp.c

-- 
2.20.1



[dpdk-dev] [dpdk-dev v13 1/4] cryptodev: change crypto symmetric vector structure

2020-10-10 Thread Fan Zhang
This patch updates ``rte_crypto_sym_vec`` structure to add
support for both cpu_crypto synchrounous operation and
asynchronous raw data-path APIs. The patch also includes
AESNI-MB and AESNI-GCM PMD changes, unit test changes and
documentation updates.

Signed-off-by: Fan Zhang 
---
 app/test/test_cryptodev.c  | 25 --
 doc/guides/prog_guide/cryptodev_lib.rst|  3 +-
 doc/guides/rel_notes/release_20_11.rst |  3 ++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c   | 18 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c |  9 +++--
 lib/librte_cryptodev/rte_crypto_sym.h  | 40 --
 lib/librte_ipsec/esp_inb.c | 12 +++
 lib/librte_ipsec/esp_outb.c| 12 +++
 lib/librte_ipsec/misc.h|  6 ++--
 9 files changed, 79 insertions(+), 49 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index ac2a36bc2..62a265520 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -151,11 +151,11 @@ static void
 process_cpu_aead_op(uint8_t dev_id, struct rte_crypto_op *op)
 {
int32_t n, st;
-   void *iv;
struct rte_crypto_sym_op *sop;
union rte_crypto_sym_ofs ofs;
struct rte_crypto_sgl sgl;
struct rte_crypto_sym_vec symvec;
+   struct rte_crypto_va_iova_ptr iv_ptr, aad_ptr, digest_ptr;
struct rte_crypto_vec vec[UINT8_MAX];
 
sop = op->sym;
@@ -171,13 +171,17 @@ process_cpu_aead_op(uint8_t dev_id, struct rte_crypto_op 
*op)
sgl.vec = vec;
sgl.num = n;
symvec.sgl = &sgl;
-   iv = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
-   symvec.iv = &iv;
-   symvec.aad = (void **)&sop->aead.aad.data;
-   symvec.digest = (void **)&sop->aead.digest.data;
+   symvec.iv = &iv_ptr;
+   symvec.digest = &digest_ptr;
+   symvec.aad = &aad_ptr;
symvec.status = &st;
symvec.num = 1;
 
+   /* for CPU crypto the IOVA address is not required */
+   iv_ptr.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   digest_ptr.va = (void *)sop->aead.digest.data;
+   aad_ptr.va = (void *)sop->aead.aad.data;
+
ofs.raw = 0;
 
n = rte_cryptodev_sym_cpu_crypto_process(dev_id, sop->session, ofs,
@@ -193,11 +197,11 @@ static void
 process_cpu_crypt_auth_op(uint8_t dev_id, struct rte_crypto_op *op)
 {
int32_t n, st;
-   void *iv;
struct rte_crypto_sym_op *sop;
union rte_crypto_sym_ofs ofs;
struct rte_crypto_sgl sgl;
struct rte_crypto_sym_vec symvec;
+   struct rte_crypto_va_iova_ptr iv_ptr, digest_ptr;
struct rte_crypto_vec vec[UINT8_MAX];
 
sop = op->sym;
@@ -213,13 +217,14 @@ process_cpu_crypt_auth_op(uint8_t dev_id, struct 
rte_crypto_op *op)
sgl.vec = vec;
sgl.num = n;
symvec.sgl = &sgl;
-   iv = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
-   symvec.iv = &iv;
-   symvec.aad = (void **)&sop->aead.aad.data;
-   symvec.digest = (void **)&sop->auth.digest.data;
+   symvec.iv = &iv_ptr;
+   symvec.digest = &digest_ptr;
symvec.status = &st;
symvec.num = 1;
 
+   iv_ptr.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   digest_ptr.va = (void *)sop->auth.digest.data;
+
ofs.raw = 0;
ofs.ofs.cipher.head = sop->cipher.data.offset - sop->auth.data.offset;
ofs.ofs.cipher.tail = (sop->auth.data.offset + sop->auth.data.length) -
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index c14f750fa..e7ba35c2d 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -620,7 +620,8 @@ operation descriptor (``struct rte_crypto_sym_vec``) 
containing:
   descriptors of performed operations (``struct rte_crypto_sgl``). Each 
instance
   of ``struct rte_crypto_sgl`` consists of a number of segments and a pointer 
to
   an array of segment descriptors ``struct rte_crypto_vec``;
-- pointers to arrays of size ``num`` containing IV, AAD and digest information,
+- pointers to arrays of size ``num`` containing IV, AAD and digest information
+  in the ``cpu_crypto`` sub-structure,
 - pointer to an array of size ``num`` where status information will be stored
   for each operation.
 
diff --git a/doc/guides/rel_notes/release_20_11.rst 
b/doc/guides/rel_notes/release_20_11.rst
index 8b911488c..2973b2a33 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -302,6 +302,9 @@ API Changes
   ``rte_fpga_lte_fec_configure`` and structure ``fpga_lte_fec_conf`` to
   ``rte_fpga_lte_fec_conf``.
 
+* The structure ``rte_crypto_sym_vec`` is updated to support both
+  cpu_crypto synchrounous operation and asynchronous raw data-path APIs.
+
 
 ABI Changes
 ---
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c 
b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 1d2a0ce00..973

[dpdk-dev] [dpdk-dev v13 2/4] cryptodev: add raw crypto data-path APIs

2020-10-10 Thread Fan Zhang
This patch adds raw data-path APIs for enqueue and dequeue
operations to cryptodev. The APIs support flexible user-define
enqueue and dequeue behaviors.

Signed-off-by: Fan Zhang 
Signed-off-by: Piotr Bronowski 
Acked-by: Adam Dybkowski 
---
 doc/guides/cryptodevs/features/default.ini|   1 +
 doc/guides/prog_guide/cryptodev_lib.rst   | 106 +
 doc/guides/rel_notes/release_20_11.rst|   7 +
 lib/librte_cryptodev/rte_cryptodev.c  |  80 
 lib/librte_cryptodev/rte_cryptodev.h  | 413 +-
 lib/librte_cryptodev/rte_cryptodev_pmd.h  |  51 ++-
 .../rte_cryptodev_version.map |  10 +
 7 files changed, 665 insertions(+), 3 deletions(-)

diff --git a/doc/guides/cryptodevs/features/default.ini 
b/doc/guides/cryptodevs/features/default.ini
index 133a246ee..17b177fc4 100644
--- a/doc/guides/cryptodevs/features/default.ini
+++ b/doc/guides/cryptodevs/features/default.ini
@@ -30,6 +30,7 @@ Asymmetric sessionless =
 CPU crypto =
 Symmetric sessionless  =
 Non-Byte aligned data  =
+Sym raw data path API  =
 
 ;
 ; Supported crypto algorithms of a default crypto driver.
diff --git a/doc/guides/prog_guide/cryptodev_lib.rst 
b/doc/guides/prog_guide/cryptodev_lib.rst
index e7ba35c2d..bcf071326 100644
--- a/doc/guides/prog_guide/cryptodev_lib.rst
+++ b/doc/guides/prog_guide/cryptodev_lib.rst
@@ -632,6 +632,112 @@ a call argument. Status different than zero must be 
treated as error.
 For more details, e.g. how to convert an mbuf to an SGL, please refer to an
 example usage in the IPsec library implementation.
 
+Cryptodev Raw Data-path APIs
+
+
+The Crypto Raw data-path APIs are a set of APIs designed to enable external
+libraries/applications to leverage the cryptographic processing provided by
+DPDK crypto PMDs through the cryptodev API but in a manner that is not
+dependent on native DPDK data structures (eg. rte_mbuf, rte_crypto_op, ... etc)
+in their data-path implementation.
+
+The raw data-path APIs have the following advantages:
+- External data structure friendly design. The new APIs uses the operation
+  descriptor ``struct rte_crypto_sym_vec`` that supports raw data pointer and
+  IOVA addresses as input. Moreover, the APIs does not require the user to
+  allocate the descriptor from mempool, nor requiring mbufs to describe input
+  data's virtual and IOVA addresses. All these features made the translation
+  from user's own data structure into the descriptor easier and more efficient.
+- Flexible enqueue and dequeue operation. The raw data-path APIs gives the
+  user more control to the enqueue and dequeue operations, including the
+  capability of precious enqueue/dequeue count, abandoning enqueue or dequeue
+  at any time, and operation status translation and set on the fly.
+
+Cryptodev PMDs which support the raw data-path APIs will have
+``RTE_CRYPTODEV_FF_SYM_RAW_DP`` feature flag presented. To use this feature,
+the user shall create a local ``struct rte_crypto_raw_dp_ctx`` buffer and
+extend to at least the length returned by ``rte_cryptodev_get_raw_dp_ctx_size``
+function call. The created buffer is then initialized using
+``rte_cryptodev_configure_raw_dp_ctx`` function with the ``is_update``
+parameter as 0. The library and the crypto device driver will then set the
+buffer and attach either the cryptodev sym session, the rte_security session,
+or the cryptodev xform for session-less operation into the ctx buffer, and
+set the corresponding enqueue and dequeue function handlers based on the
+algorithm information stored in the session or xform. When the ``is_update``
+parameter passed into ``rte_cryptodev_configure_raw_dp_ctx`` is 1, the driver
+will not initialize the buffer but only update the session or xform and
+the function handlers accordingly.
+
+After the ``struct rte_crypto_raw_dp_ctx`` buffer is initialized, it is now
+ready for enqueue and dequeue operation. There are two different enqueue
+functions: ``rte_cryptodev_raw_enqueue`` to enqueue single raw data
+operation, and ``rte_cryptodev_raw_enqueue_burst`` to enqueue a descriptor
+with multiple operations. In case of the application uses similar approach to
+``struct rte_crypto_sym_vec`` to manage its data burst but with different
+data structure, using the ``rte_cryptodev_raw_enqueue_burst`` function may be
+less efficient as this is a situation where the application has to loop over
+all crypto operations to assemble the ``struct rte_crypto_sym_vec`` descriptor
+from its own data structure, and then the driver will loop over them again to
+translate every operation in the descriptor to the driver's specific queue 
data.
+The ``rte_cryptodev_raw_enqueue`` should be used to save one loop for each data
+burst instead.
+
+The ``rte_cryptodev_raw_enqueue`` and ``rte_cryptodev_raw_enqueue_burst``
+functions will return or set the enqueue status. ``rte_cryptodev_raw_enqueue``
+will return the status directly, ``rte_cryptodev_raw_enqueue_bur

[dpdk-dev] [dpdk-dev v13 3/4] crypto/qat: add raw crypto data-path API support

2020-10-10 Thread Fan Zhang
This patch updates QAT PMD to add raw data-path API support.

Signed-off-by: Fan Zhang 
Acked-by: Adam Dybkowski 
---
 doc/guides/cryptodevs/features/qat.ini |   1 +
 doc/guides/rel_notes/release_20_11.rst |   4 +
 drivers/crypto/qat/meson.build |   1 +
 drivers/crypto/qat/qat_sym.h   |  11 +
 drivers/crypto/qat/qat_sym_hw_dp.c | 959 +
 drivers/crypto/qat/qat_sym_pmd.c   |   9 +-
 6 files changed, 983 insertions(+), 2 deletions(-)
 create mode 100644 drivers/crypto/qat/qat_sym_hw_dp.c

diff --git a/doc/guides/cryptodevs/features/qat.ini 
b/doc/guides/cryptodevs/features/qat.ini
index 9e82f2886..6cc09cde7 100644
--- a/doc/guides/cryptodevs/features/qat.ini
+++ b/doc/guides/cryptodevs/features/qat.ini
@@ -17,6 +17,7 @@ Digest encrypted   = Y
 Asymmetric sessionless = Y
 RSA PRIV OP KEY EXP= Y
 RSA PRIV OP KEY QT = Y
+Sym raw data path API  = Y
 
 ;
 ; Supported crypto algorithms of the 'qat' crypto driver.
diff --git a/doc/guides/rel_notes/release_20_11.rst 
b/doc/guides/rel_notes/release_20_11.rst
index 85a07d86e..008f4eedc 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -104,6 +104,10 @@ New Features
   * Added support for non-HMAC auth algorithms
 (MD5, SHA1, SHA224, SHA256, SHA384, SHA512).
 
+* **Update QAT crypto PMD.**
+
+  * Added Raw Data-path APIs support.
+
 * **Added Intel ACC100 bbdev PMD.**
 
   Added a new ``acc100`` bbdev driver for the Intel\ |reg| ACC100 accelerator
diff --git a/drivers/crypto/qat/meson.build b/drivers/crypto/qat/meson.build
index a225f374a..bc90ec44c 100644
--- a/drivers/crypto/qat/meson.build
+++ b/drivers/crypto/qat/meson.build
@@ -15,6 +15,7 @@ if dep.found()
qat_sources += files('qat_sym_pmd.c',
 'qat_sym.c',
 'qat_sym_session.c',
+'qat_sym_hw_dp.c',
 'qat_asym_pmd.c',
 'qat_asym.c')
qat_ext_deps += dep
diff --git a/drivers/crypto/qat/qat_sym.h b/drivers/crypto/qat/qat_sym.h
index 1a9748849..7254f5e3c 100644
--- a/drivers/crypto/qat/qat_sym.h
+++ b/drivers/crypto/qat/qat_sym.h
@@ -264,6 +264,16 @@ qat_sym_process_response(void **op, uint8_t *resp)
}
*op = (void *)rx_op;
 }
+
+int
+qat_sym_configure_dp_ctx(struct rte_cryptodev *dev, uint16_t qp_id,
+   struct rte_crypto_raw_dp_ctx *raw_dp_ctx,
+   enum rte_crypto_op_sess_type sess_type,
+   union rte_cryptodev_session_ctx session_ctx, uint8_t is_update);
+
+int
+qat_sym_get_dp_ctx_size(struct rte_cryptodev *dev);
+
 #else
 
 static inline void
@@ -276,5 +286,6 @@ static inline void
 qat_sym_process_response(void **op __rte_unused, uint8_t *resp __rte_unused)
 {
 }
+
 #endif
 #endif /* _QAT_SYM_H_ */
diff --git a/drivers/crypto/qat/qat_sym_hw_dp.c 
b/drivers/crypto/qat/qat_sym_hw_dp.c
new file mode 100644
index 0..dfbbad59b
--- /dev/null
+++ b/drivers/crypto/qat/qat_sym_hw_dp.c
@@ -0,0 +1,959 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include 
+
+#include "adf_transport_access_macros.h"
+#include "icp_qat_fw.h"
+#include "icp_qat_fw_la.h"
+
+#include "qat_sym.h"
+#include "qat_sym_pmd.h"
+#include "qat_sym_session.h"
+#include "qat_qp.h"
+
+struct qat_sym_dp_ctx {
+   struct qat_sym_session *session;
+   uint32_t tail;
+   uint32_t head;
+   uint16_t cached_enqueue;
+   uint16_t cached_dequeue;
+};
+
+static __rte_always_inline int32_t
+qat_sym_dp_parse_data_vec(struct qat_qp *qp, struct icp_qat_fw_la_bulk_req 
*req,
+   struct rte_crypto_vec *data, uint16_t n_data_vecs)
+{
+   struct qat_queue *tx_queue;
+   struct qat_sym_op_cookie *cookie;
+   struct qat_sgl *list;
+   uint32_t i;
+   uint32_t total_len;
+
+   if (likely(n_data_vecs == 1)) {
+   req->comn_mid.src_data_addr = req->comn_mid.dest_data_addr =
+   data[0].iova;
+   req->comn_mid.src_length = req->comn_mid.dst_length =
+   data[0].len;
+   return data[0].len;
+   }
+
+   if (n_data_vecs == 0 || n_data_vecs > QAT_SYM_SGL_MAX_NUMBER)
+   return -1;
+
+   total_len = 0;
+   tx_queue = &qp->tx_q;
+
+   ICP_QAT_FW_COMN_PTR_TYPE_SET(req->comn_hdr.comn_req_flags,
+   QAT_COMN_PTR_TYPE_SGL);
+   cookie = qp->op_cookies[tx_queue->tail >> tx_queue->trailz];
+   list = (struct qat_sgl *)&cookie->qat_sgl_src;
+
+   for (i = 0; i < n_data_vecs; i++) {
+   list->buffers[i].len = data[i].len;
+   list->buffers[i].resrvd = 0;
+   list->buffers[i].addr = data[i].iova;
+   if (total_len + data[i].len > UINT32_MAX) {
+   QAT_DP_LOG(ERR, "Message too long");
+   return -1;
+   }
+   total_len += data[i].len;
+   }
+
+ 

[dpdk-dev] [dpdk-dev v13 4/4] test/crypto: add unit-test for cryptodev raw API test

2020-10-10 Thread Fan Zhang
This patch adds the cryptodev raw API test support to unit test.
In addition a new test-case for QAT PMD for the test type is
enabled.

Signed-off-by: Fan Zhang 
Acked-by: Adam Dybkowski 
---
 app/test/test_cryptodev.c | 787 --
 app/test/test_cryptodev.h |  12 +
 app/test/test_cryptodev_blockcipher.c |  58 +-
 3 files changed, 803 insertions(+), 54 deletions(-)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 62a265520..219373e10 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -49,6 +49,10 @@
 #define VDEV_ARGS_SIZE 100
 #define MAX_NB_SESSIONS 4
 
+#define MAX_DRV_SERVICE_CTX_SIZE 256
+
+#define MAX_RAW_DEQUEUE_COUNT  65535
+
 #define IN_PLACE 0
 #define OUT_OF_PLACE 1
 
@@ -57,6 +61,8 @@ static int gbl_driver_id;
 static enum rte_security_session_action_type gbl_action_type =
RTE_SECURITY_ACTION_TYPE_NONE;
 
+enum cryptodev_api_test_type global_api_test_type = CRYPTODEV_API_TEST;
+
 struct crypto_testsuite_params {
struct rte_mempool *mbuf_pool;
struct rte_mempool *large_mbuf_pool;
@@ -147,6 +153,215 @@ ceil_byte_length(uint32_t num_bits)
return (num_bits >> 3);
 }
 
+static uint32_t
+get_raw_dp_dequeue_count(void *user_data __rte_unused)
+{
+   return 1;
+}
+
+static void
+post_process_raw_dp_op(void *user_data,uint32_t index __rte_unused,
+   uint8_t is_op_success)
+{
+   struct rte_crypto_op *op = user_data;
+   op->status = is_op_success ? RTE_CRYPTO_OP_STATUS_SUCCESS :
+   RTE_CRYPTO_OP_STATUS_ERROR;
+}
+
+void
+process_sym_raw_dp_op(uint8_t dev_id, uint16_t qp_id,
+   struct rte_crypto_op *op, uint8_t is_cipher, uint8_t is_auth,
+   uint8_t len_in_bits, uint8_t cipher_iv_len)
+{
+   struct rte_crypto_sym_op *sop = op->sym;
+   struct rte_crypto_op *ret_op = NULL;
+   struct rte_crypto_vec data_vec[UINT8_MAX];
+   struct rte_crypto_va_iova_ptr cipher_iv, digest, aad_auth_iv;
+   union rte_crypto_sym_ofs ofs;
+   struct rte_crypto_sym_vec vec;
+   struct rte_crypto_sgl sgl;
+   uint32_t max_len;
+   union rte_cryptodev_session_ctx sess;
+   uint32_t count = 0;
+   struct rte_crypto_raw_dp_ctx *ctx;
+   uint32_t cipher_offset = 0, cipher_len = 0, auth_offset = 0,
+   auth_len = 0;
+   int32_t n;
+   uint32_t n_success;
+   int ctx_service_size;
+   int32_t status = 0;
+   int enqueue_status, dequeue_status;
+
+   ctx_service_size = rte_cryptodev_get_raw_dp_ctx_size(dev_id);
+   if (ctx_service_size < 0) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   return;
+   }
+
+   ctx = malloc(ctx_service_size);
+   if (!ctx) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   return;
+   }
+
+   /* Both are enums, setting crypto_sess will suit any session type */
+   sess.crypto_sess = op->sym->session;
+
+   if (rte_cryptodev_configure_raw_dp_ctx(dev_id, qp_id, ctx,
+   op->sess_type, sess, 0) < 0) {
+   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+   goto exit;
+   }
+
+   cipher_iv.iova = 0;
+   cipher_iv.va = NULL;
+   aad_auth_iv.iova = 0;
+   aad_auth_iv.va = NULL;
+   digest.iova = 0;
+   digest.va = NULL;
+   sgl.vec = data_vec;
+   vec.num = 1;
+   vec.sgl = &sgl;
+   vec.iv = &cipher_iv;
+   vec.digest = &digest;
+   vec.aad = &aad_auth_iv;
+   vec.status = &status;
+
+   ofs.raw = 0;
+
+   if (is_cipher && is_auth) {
+   cipher_offset = sop->cipher.data.offset;
+   cipher_len = sop->cipher.data.length;
+   auth_offset = sop->auth.data.offset;
+   auth_len = sop->auth.data.length;
+   max_len = RTE_MAX(cipher_offset + cipher_len,
+   auth_offset + auth_len);
+   if (len_in_bits) {
+   max_len = max_len >> 3;
+   cipher_offset = cipher_offset >> 3;
+   auth_offset = auth_offset >> 3;
+   cipher_len = cipher_len >> 3;
+   auth_len = auth_len >> 3;
+   }
+   ofs.ofs.cipher.head = cipher_offset;
+   ofs.ofs.cipher.tail = max_len - cipher_offset - cipher_len;
+   ofs.ofs.auth.head = auth_offset;
+   ofs.ofs.auth.tail = max_len - auth_offset - auth_len;
+   cipher_iv.va = rte_crypto_op_ctod_offset(op, void *, IV_OFFSET);
+   cipher_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET);
+   aad_auth_iv.va = rte_crypto_op_ctod_offset(
+   op, void *, IV_OFFSET + cipher_iv_len);
+   aad_auth_iv.iova = rte_crypto_op_ctophys_offset(op, IV_OFFSET +
+   cipher_iv_len);
+  

Re: [dpdk-dev] [PATCH v6 04/14] doc: remove references to make from vdpadevs guides

2020-10-10 Thread Matan Azrad
Hi Ciara,

Looks good.

From: Ciara Power 
> Make is no longer supported for compiling DPDK, references are now
> removed in the documentation.
> 
> Signed-off-by: Ciara Power 
Acked-by: Matan Azrad