Re: [dpdk-dev] [PATCH v10 0/8] Introduce event vectorization

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 3:00 PM  wrote:
>
> From: Pavan Nikhilesh 
>
> In traditional event programming model, events are identified by a
> flow-id and a uintptr_t. The flow-id uniquely identifies a given event
> and determines the order of scheduling based on schedule type, the
> uintptr_t holds a single object.
>
> Event devices also support burst mode with configurable dequeue depth,
> i.e. each dequeue call would return multiple events and each event
> might be at a different stage of the pipeline.
> Having a burst of events belonging to different stages in a dequeue
> burst is not only difficult to vectorize but also increases the scheduler
> overhead and application overhead of pipelining events further.
> Using event vectors we see a performance gain of ~742.3% as shown in [1].
>
> By introducing event vectorization, each event will be capable of holding
> multiple uintptr_t of the same flow thereby allowing applications
> to vectorize their pipeline and reduce the complexity of pipelining
> events across multiple stages. This also reduces the complexity of handling
> enqueue and dequeue on an event device.
>
> Since event devices are transparent to the events they are scheduling
> so the event producers such as eth_rx_adapter, crypto_adapter , etc..
> are responsible for vectorizing the buffers of the same flow into a single
> event.
>
> The series also breaks ABI in the patch [8/8] which is targetted to the
> v21.11 release.
>
> The dpdk-test-eventdev application has been updated with options to test
> multiple vector sizes and timeouts.
>
> [1]
> As for performance improvement, with a ARM Cortex-A72 equivalent processer,
> software event device (--vdev=event_sw0), single worker core, single stage
> and using one service core for Rx adapter, Tx adapter, Scheduling.
>
> Without this patchset applied:
> ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
>  --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
>  --stlist=a --wlcores=20
> Port[0] using Rx adapter[0] configured
> Port[0] using Tx adapter[0] Configured
> 5.071 mpps
>
> With the patchset applied and Without event vectorization:
> ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
>  --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
>  --stlist=a --wlcores=20
> Port[0] using Rx adapter[0] configured
> Port[0] using Tx adapter[0] Configured
> 5.123 mpps
>
> With event vectorization:
> ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
> --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
> --stlist=a --wlcores=20 --enable_vector --nb_eth_queues 1
> --vector_size 256
> Port[0] using Rx adapter[0] configured
> Port[0] using Tx adapter[0] Configured
> 42.715 mpps
>
> Having dedicated service cores for each Rx queues and tweaking the vector,
> dequeue burst size would further improve performance.
>
> API usage is shown below:
>
> Configuration:
>
> struct rte_event_eth_rx_adapter_event_vector_config vec_conf;
>
> vector_pool = rte_event_vector_pool_create("vector_pool",
> nb_elem, 0, vector_size, socket_id);
>
> rte_event_eth_rx_adapter_create(id, event_id, &adptr_conf);
> rte_event_eth_rx_adapter_queue_add(id, eth_id, -1, &queue_conf);
> if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR) {
> vec_conf.vector_sz = vector_size;
> vec_conf.vector_timeout_ns = vector_tmo_nsec;
> vec_conf.vector_mp = vector_pool;
> rte_event_eth_rx_adapter_queue_event_vector_config(id,
> eth_id, -1, &vec_conf);
> }
>
> Fastpath:
>
> num = rte_event_dequeue_burst(event_id, port_id, &ev, 1, 0);
> if (!num)
> continue;
>
> if (ev.event_type & RTE_EVENT_TYPE_VECTOR) {
> switch (ev.event_type) {
> case RTE_EVENT_TYPE_ETHDEV_VECTOR:
> case RTE_EVENT_TYPE_ETH_RX_ADAPTER_VECTOR:
> struct rte_mbuf **mbufs;
>
> mbufs = ev.vector_ev->mbufs;
> for (i = 0; i < ev.vector_ev->nb_elem; i++)
> //Process mbufs.
> break;
> case ...
> }
> }
> ...
>




Series applied to dpdk-next-net-eventdev/for-main. Thanks



> v10 Changes:
> - Update Rx adapter documentation with flow identifier bitfield format. (Jay)
>
> v9 Changes:
> - Update Rx adapter documentation w.r.t SW event vectorizations. (Jay)
> - Push partial vectors to event device on queue delete. (Jay)
>
> v8 Changes:
> - Fix incorrect shift for vector timeout interval.(Jay)
> - Code reallocation.(Jay)
>
> v7 Changes:
> - More doxygen fixes.(Jay)
> - Reduce code duplication in 4/8.(Jay)
>
> v6 Changes:
> - Ma

Re: [dpdk-dev] [PATCH v2 00/27] Add DLB V2.5

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 1:06 AM Timothy McDaniel
 wrote:
>
> This patch series adds support for DLB v2.5 to
> the current DLB V2.0 PMD. The resulting PMD supports
> both hardware versions.
>
> The main differences between the DLB v2.5 and v2.0 hardware
> are:
> - Number of queues/ports
> - DLB v2.5 uses a combined credit pool, whereas DLB v2.0
>   splits credits into 2 pools, a directed credit pool and a
>   load balanced credit pool.
> - Different register maps, with different bit names and offsets

Please fix the following issues

[for-main]dell[dpdk-next-eventdev] $ ./devtools/check-git-log.sh -n 27
Wrong headline format:
event/dlb2: add v2.5 get_resources
event/dlb2: delete old dlb2_resource.c file
event/dlb2: move dlb_resource_new.c to dlb_resource.c
event/dlb2: remove temporary file, dlb_hw_types.h
event/dlb2: move dlb2_hw_type_new.h to dlb2_hw_types.h
event/dlb2: delete old register map file, dlb2_regs.h
event/dlb2: rename dlb2_regs_new.h to dlb2_regs.h
event/dlb2: Change device name to dlb_event
Wrong headline uppercase:
event/dlb2: Change device name to dlb_event

./devtools/checkpatches.sh -n 27

### event/dlb2: add v2.5 sparse cq mode

WARNING:EMAIL_SUBJECT: A patch subject line should describe the change
not the tool that found it
#4:
Subject: [PATCH] event/dlb2: add v2.5 sparse cq mode

WARNING:REPEATED_WORD: Possible repeated word: 'mode'
#6:
Update sparse cq mode mode functions for DLB v2.5, accounting for new

total: 0 errors, 2 warnings, 70 lines checked

### event/dlb2: Change device name to dlb_event

WARNING:REPEATED_WORD: Possible repeated word: 'the'
#9:
to the the directory name that contains the PMD, as well

total: 0 errors, 1 warnings, 666 lines checked

22/27 valid patches



>
> In order to support both hardware versions with the same PMD,
> and avoid code duplication, the file dlb2_resource.c required a
> complete rewrite. This required some creative staging of the changes
> in order to keep the individual patches relatively small, while
> also meeting the requirement that all individual patches in the set
> compile cleanly.
>
> To accomplish this, a few temporary files are used:
>
> dlb2_hw_types_new.h
> dlb2_resources_new.h
> dlb2_resources_new.c
>
> As dlb2_resources_new.c is populated with the new combined v2.0/v2.5
> low level logic, the corresponding old code is removed from
> dlb2_resource.c, thus allowing both the original and new code to
> continue to compile and link cleanly. Once all of the code has been
> migrated to the new model, the old versions of the files are removed,
> and the new versions are renamed, effectively replacing the old original
> files.
>
> As you review the code, you can ignore the code deletions from
> dlb2_resource.c, as that file continues to shrink as the new
> corresponding logic is added to dlb2_resource_new.c.
>
> Changes since V1
> 1) Simplified subject text for all patches
> 2) correct typos/spelling
> 3) remove FPGA references
> 4) remove stale sysconf() references
> 5) fixed patches that had compilation issues
> 6) updated release notes
> 7) renamed dlb device from dlb2_event to dlb_event
> 8) moved dlb2 directory to dlb,to match name change
> 9) fixed other cases where "dlb2" was being used externally
>
> Timothy McDaniel (27):
>   event/dlb2: add v2.5 probe
>   event/dlb2: add v2.5 HW init
>   event/dlb2: add v2.5 get_resources
>   event/dlb2: add v2.5 create sched domain
>   event/dlb2: add v2.5 domain reset
>   event/dlb2: add V2.5 create ldb queue
>   event/dlb2: add v2.5 create ldb port
>   event/dlb2: add v2.5 create dir port
>   event/dlb2: add v2.5 create dir queue
>   event/dlb2: add v2.5 map qid
>   event/dlb2: add v2.5 unmap queue
>   event/dlb2: add v2.5 start domain
>   event/dlb2: add v2.5 credit scheme
>   event/dlb2: add v2.5 queue depth functions
>   event/dlb2: add v2.5 finish map/unmap
>   event/dlb2: add v2.5 sparse cq mode
>   event/dlb2: add v2.5 sequence number management
>   event/dlb2: consolidate resource header files into one file
>   event/dlb2: delete old dlb2_resource.c file
>   event/dlb2: move dlb_resource_new.c to dlb_resource.c
>   event/dlb2: remove temporary file, dlb_hw_types.h
>   event/dlb2: move dlb2_hw_type_new.h to dlb2_hw_types.h
>   event/dlb2: delete old register map file, dlb2_regs.h
>   event/dlb2: rename dlb2_regs_new.h to dlb2_regs.h
>   event/dlb2: update xstats for v2.5
>   doc/dlb2: update documentation for v2.5
>   event/dlb2: Change device name to dlb_event
>
>  MAINTAINERS   |6 +-
>  app/test/test_eventdev.c  |6 +-
>  config/rte_config.h   |   11 +-
>  doc/api/doxy-api-index.md |2 +-
>  doc/api/doxy-api.conf.in  |2 +-
>  doc/guides/eventdevs/dlb.rst  |  390 ++
>  doc/guides/eventdevs/dlb2.rst |   75 +-
>  doc/guides/eventdevs/index.rst 

Re: [dpdk-dev] [PATCH v2 02/27] event/dlb2: add v2.5 HW init

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 1:07 AM Timothy McDaniel
 wrote:
>
> This commit adds support for DLB v2.5 probe-time hardware init,
> and sets up a framework for incorporating the remaining
> changes required to support DLB v2.5.
>
> DLB v2.0 and DLB v2.5 are similar in many respects, but their
> register offsets and definitions are different. As a result of these,
> differences, the low level hardware functions must take the device
> version into consideration. This requires that the hardware version be
> passed to many of the low level functions, so that the PMD can
> take the appropriate action based on the device version.
>
> To ease the transition and keep the individual patches small, three
> temporary files are added in this commit. These files have "new"
> in their names.  The files with "new" contain changes specific to a
> consolidated PMD that supports both DLB v2.0 and DLB 2.5. Their sister
> files of the same name (minus "new") contain the old DLB v2.0 specific
> code. The intent is to remove code from the original files as that code
> is ported to the combined DLB 2.0/2.5 PMD model and added to the "new"
> files in a series of commits. At end of the patch series, the old files
> will be empty and the "new" files will have the logic needed
> to implement a single PMD that supports both DLB v2.0 and DLB v2.5.
> At that time, the original DLB v2.0 specific files will be deleted,
> and the "new" files will be renamed and replace them.
>
> Signed-off-by: Timothy McDaniel 
> ---
>  drivers/event/dlb2/dlb2_priv.h|5 +
>  drivers/event/dlb2/meson.build|1 +
>  .../event/dlb2/pf/base/dlb2_hw_types_new.h|  362 ++
>  drivers/event/dlb2/pf/base/dlb2_osdep.h   |4 +
>  drivers/event/dlb2/pf/base/dlb2_regs_new.h| 4412 +
>  drivers/event/dlb2/pf/base/dlb2_resource.c|  180 +-
>  drivers/event/dlb2/pf/base/dlb2_resource.h|   36 -
>  .../event/dlb2/pf/base/dlb2_resource_new.c|  259 +
>  .../event/dlb2/pf/base/dlb2_resource_new.h|   73 +
>  drivers/event/dlb2/pf/dlb2_main.c |   41 +-
>  drivers/event/dlb2/pf/dlb2_main.h |4 +
>  drivers/event/dlb2/pf/dlb2_pf.c   |6 +-
>  12 files changed, 5153 insertions(+), 230 deletions(-)
>  create mode 100644 drivers/event/dlb2/pf/base/dlb2_hw_types_new.h
>  create mode 100644 drivers/event/dlb2/pf/base/dlb2_regs_new.h
>  create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource_new.c
>  create mode 100644 drivers/event/dlb2/pf/base/dlb2_resource_new.h
>
> diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
> index 1cd78ad94..f3a9fe0aa 100644
> --- a/drivers/event/dlb2/dlb2_priv.h
> +++ b/drivers/event/dlb2/dlb2_priv.h
> @@ -114,6 +114,11 @@
>  #define EV_TO_DLB2_PRIO(x) ((x) >> 5)
>  #define DLB2_TO_EV_PRIO(x) ((x) << 5)
>
> +enum dlb2_hw_ver {
> +   DLB2_HW_VER_2,
> +   DLB2_HW_VER_2_5,
> +};
> +
>  enum dlb2_hw_port_types {
> DLB2_LDB_PORT,
> DLB2_DIR_PORT,
> diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build
> index f22638b8e..bded07e06 100644
> --- a/drivers/event/dlb2/meson.build
> +++ b/drivers/event/dlb2/meson.build
> @@ -14,6 +14,7 @@ sources = files('dlb2.c',
> 'pf/dlb2_main.c',
> 'pf/dlb2_pf.c',
> 'pf/base/dlb2_resource.c',
> +   'pf/base/dlb2_resource_new.c',
> 'rte_pmd_dlb2.c',
> 'dlb2_selftest.c'
>  )
> diff --git a/drivers/event/dlb2/pf/base/dlb2_hw_types_new.h 
> b/drivers/event/dlb2/pf/base/dlb2_hw_types_new.h
> new file mode 100644
> index 0..d58aa94ad
> --- /dev/null
> +++ b/drivers/event/dlb2/pf/base/dlb2_hw_types_new.h
> @@ -0,0 +1,362 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2016-2020 Intel Corporation
> + */
> +
> +#ifndef __DLB2_HW_TYPES_NEW_H
> +#define __DLB2_HW_TYPES_NEW_H
> +
> +#include "../../dlb2_priv.h"
> +#include "dlb2_user.h"
> +
> +#include "dlb2_osdep_list.h"
> +#include "dlb2_osdep_types.h"
> +#include "dlb2_regs_new.h"
> +
> +#define DLB2_BITS_SET(x, val, mask)(x = ((x) & ~(mask)) \
> +| (((val) << (mask##_LOC)) & (mask)))
> +#define DLB2_BITS_CLR(x, mask) (x &= ~(mask))
> +#define DLB2_BIT_SET(x, mask)  ((x) |= (mask))
> +#define DLB2_BITS_GET(x, mask) (((x) & (mask)) >> (mask##_LOC))
> +
> +#define DLB2_MAX_NUM_VDEVS 16
> +#define DLB2_MAX_NUM_SEQUENCE_NUMBER_GROUPS2
> +#define DLB2_NUM_ARB_WEIGHTS   8
> +#define DLB2_MAX_NUM_AQED_ENTRIES  2048
> +#define DLB2_MAX_WEIGHT255
> +#define DLB2_NUM_COS_DOMAINS   4
> +#define DLB2_MAX_NUM_SEQUENCE_NUMBER_GROUPS2
> +#define DLB2_MAX_NUM_SEQUENCE_NUMBER_MODES 5
> +#define DLB2_MAX_CQ_COMP_CHECK_LOOPS   409600
> +#define DLB2_MAX_QID_EMPTY_CHECK_LOOPS (32 * 64 * 1024 * (800 / 30))
> +
> +#define DLB2_FUNC_BAR  

Re: [dpdk-dev] [PATCH v4 2/3] event/octeontx2: support crypto adapter forward mode

2021-04-03 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Shijith Thotton 
> Sent: Friday, April 2, 2021 10:31 PM
> To: dev@dpdk.org
> Cc: Shijith Thotton ; tho...@monjalon.net;
> jer...@marvell.com; Gujjar, Abhinandan S ;
> hemant.agra...@nxp.com; nipun.gu...@nxp.com;
> sachin.sax...@oss.nxp.com; ano...@marvell.com; ma...@nvidia.com;
> Zhang, Roy Fan ; g.si...@nxp.com; Carrillo, Erik
> G ; Jayatheerthan, Jay
> ; pbhagavat...@marvell.com; Van Haaren,
> Harry ; Akhil Goyal 
> Subject: [PATCH v4 2/3] event/octeontx2: support crypto adapter forward
> mode
> 
> Advertise crypto adapter forward mode capability and set crypto adapter
> enqueue function in driver.
> 
> Signed-off-by: Shijith Thotton 
> ---
>  drivers/crypto/octeontx2/otx2_cryptodev_ops.c | 42 ++
>  drivers/event/octeontx2/otx2_evdev.c  |  5 +-
>  .../event/octeontx2/otx2_evdev_crypto_adptr.c |  3 +-  ...dptr_dp.h =>
> otx2_evdev_crypto_adptr_rx.h} |  6 +-
>  .../octeontx2/otx2_evdev_crypto_adptr_tx.h| 82
> +++
>  drivers/event/octeontx2/otx2_worker.h |  2 +-
>  drivers/event/octeontx2/otx2_worker_dual.h|  2 +-
>  7 files changed, 121 insertions(+), 21 deletions(-)  rename
> drivers/event/octeontx2/{otx2_evdev_crypto_adptr_dp.h =>
> otx2_evdev_crypto_adptr_rx.h} (93%)  create mode 100644
> drivers/event/octeontx2/otx2_evdev_crypto_adptr_tx.h
> 
> diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
> b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
> index cec20b5c6..4808dca64 100644
> --- a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
> +++ b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "otx2_cryptodev.h"
>  #include "otx2_cryptodev_capabilities.h"
> @@ -434,15 +435,28 @@ sym_session_configure(int driver_id, struct
> rte_crypto_sym_xform *xform,
>   return -ENOTSUP;
>  }
> 
> -static __rte_always_inline void __rte_hot
> +static __rte_always_inline int32_t __rte_hot
>  otx2_ca_enqueue_req(const struct otx2_cpt_qp *qp,
>   struct cpt_request_info *req,
>   void *lmtline,
> + struct rte_crypto_op *op,
>   uint64_t cpt_inst_w7)
>  {
> + union rte_event_crypto_metadata *m_data;
>   union cpt_inst_s inst;
>   uint64_t lmt_status;
> 
> + if (op->sess_type == RTE_CRYPTO_OP_WITH_SESSION)
> + m_data = rte_cryptodev_sym_session_get_user_data(
> + op->sym->session);
> + else if (op->sess_type == RTE_CRYPTO_OP_SESSIONLESS &&
> +  op->private_data_offset)
> + m_data = (union rte_event_crypto_metadata *)
> +  ((uint8_t *)op +
> +   op->private_data_offset);
> + else
> + return -EINVAL;
> +
>   inst.u[0] = 0;
>   inst.s9x.res_addr = req->comp_baddr;
>   inst.u[2] = 0;
> @@ -453,12 +467,11 @@ otx2_ca_enqueue_req(const struct otx2_cpt_qp
> *qp,
>   inst.s9x.ei2 = req->ist.ei2;
>   inst.s9x.ei3 = cpt_inst_w7;
> 
> - inst.s9x.qord = 1;
> - inst.s9x.grp = qp->ev.queue_id;
> - inst.s9x.tt = qp->ev.sched_type;
> - inst.s9x.tag = (RTE_EVENT_TYPE_CRYPTODEV << 28) |
> - qp->ev.flow_id;
> - inst.s9x.wq_ptr = (uint64_t)req >> 3;
> + inst.u[2] = (((RTE_EVENT_TYPE_CRYPTODEV << 28) |
> +   m_data->response_info.flow_id) |
> +  ((uint64_t)m_data->response_info.sched_type << 32) |
> +  ((uint64_t)m_data->response_info.queue_id << 34));
> + inst.u[3] = 1 | (((uint64_t)req >> 3) << 3);
>   req->qp = qp;
> 
>   do {
> @@ -475,22 +488,22 @@ otx2_ca_enqueue_req(const struct otx2_cpt_qp
> *qp,
>   lmt_status = otx2_lmt_submit(qp->lf_nq_reg);
>   } while (lmt_status == 0);
> 
> + return 0;
>  }
> 
>  static __rte_always_inline int32_t __rte_hot  otx2_cpt_enqueue_req(const
> struct otx2_cpt_qp *qp,
>struct pending_queue *pend_q,
>struct cpt_request_info *req,
> +  struct rte_crypto_op *op,
>uint64_t cpt_inst_w7)
>  {
>   void *lmtline = qp->lmtline;
>   union cpt_inst_s inst;
>   uint64_t lmt_status;
> 
> - if (qp->ca_enable) {
> - otx2_ca_enqueue_req(qp, req, lmtline, cpt_inst_w7);
> - return 0;
> - }
> + if (qp->ca_enable)
> + return otx2_ca_enqueue_req(qp, req, lmtline, op,
> cpt_inst_w7);
> 
>   if (unlikely(pend_q->pending_count >=
> OTX2_CPT_DEFAULT_CMD_QLEN))
>   return -EAGAIN;
> @@ -594,7 +607,8 @@ otx2_cpt_enqueue_asym(struct otx2_cpt_qp *qp,
>   goto req_fail;
>   }
> 
> - ret = otx2_cpt_enqueue_req(qp, pend_q, params.req, sess-
> >cpt_inst_w7);
> + ret = otx2_cpt_enqueue_req(qp, pend_q, params.req, op,
> +sess->cpt_inst_w7);
> 
>   if (unlikely(ret)) {
>   CPT_LOG_DP_ERR("Co

Re: [dpdk-dev] [PATCH 04/25] event/dlb2: add DLB v2.5 support to create sched domain

2021-04-03 Thread Jerin Jacob
On Wed, Mar 17, 2021 at 3:50 AM Timothy McDaniel
 wrote:
>
> Update domain creation logic to account for DLB v2.5
> credit scheme, new register map, and new register access
> macros.
>
> Signed-off-by: Timothy McDaniel 

> ---
>  drivers/event/dlb2/dlb2_user.h|  13 +-
>  drivers/event/dlb2/pf/base/dlb2_resource.c| 645 
>  .../event/dlb2/pf/base/dlb2_resource_new.c| 696 ++

Please use git mv foo bar to avoid creating such big diff.
Wherever possible use git mv to reduce the diff in the patch.




>  3 files changed, 707 insertions(+), 647 deletions(-)
>
> diff --git a/drivers/event/dlb2/dlb2_user.h b/drivers/event/dlb2/dlb2_user.h
> index b7d125dec..9760e9bda 100644
> --- a/drivers/event/dlb2/dlb2_user.h
> +++ b/drivers/event/dlb2/dlb2_user.h
> @@ -18,6 +18,7 @@ enum dlb2_error {
> DLB2_ST_LDB_QUEUES_UNAVAILABLE,
> DLB2_ST_LDB_CREDITS_UNAVAILABLE,
> DLB2_ST_DIR_CREDITS_UNAVAILABLE,
> +   DLB2_ST_CREDITS_UNAVAILABLE,
> DLB2_ST_SEQUENCE_NUMBERS_UNAVAILABLE,
> DLB2_ST_INVALID_DOMAIN_ID,
> DLB2_ST_INVALID_QID_INFLIGHT_ALLOCATION,
> @@ -57,6 +58,7 @@ static const char dlb2_error_strings[][128] = {
> "DLB2_ST_LDB_QUEUES_UNAVAILABLE",
> "DLB2_ST_LDB_CREDITS_UNAVAILABLE",
> "DLB2_ST_DIR_CREDITS_UNAVAILABLE",
> +   "DLB2_ST_CREDITS_UNAVAILABLE",
> "DLB2_ST_SEQUENCE_NUMBERS_UNAVAILABLE",
> "DLB2_ST_INVALID_DOMAIN_ID",
> "DLB2_ST_INVALID_QID_INFLIGHT_ALLOCATION",
> @@ -170,8 +172,15 @@ struct dlb2_create_sched_domain_args {
> __u32 num_dir_ports;
> __u32 num_atomic_inflights;
> __u32 num_hist_list_entries;
> -   __u32 num_ldb_credits;
> -   __u32 num_dir_credits;
> +   union {
> +   struct {
> +   __u32 num_ldb_credits;
> +   __u32 num_dir_credits;
> +   };
> +   struct {
> +   __u32 num_credits;
> +   };
> +   };
> __u8 cos_strict;
> __u8 padding1[3];
>  };
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.c 
> b/drivers/event/dlb2/pf/base/dlb2_resource.c
> index 5b8723aaf..5d296f725 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.c
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.c
> @@ -33,21 +33,6 @@
>  #define DLB2_FUNC_LIST_FOR_SAFE(head, ptr, ptr_tmp, it, it_tmp) \
> DLB2_LIST_FOR_EACH_SAFE((head), ptr, ptr_tmp, func_list, it, it_tmp)
>
> -static void dlb2_init_domain_rsrc_lists(struct dlb2_hw_domain *domain)
> -{
> -   int i;
> -
> -   dlb2_list_init_head(&domain->used_ldb_queues);
> -   dlb2_list_init_head(&domain->used_dir_pq_pairs);
> -   dlb2_list_init_head(&domain->avail_ldb_queues);
> -   dlb2_list_init_head(&domain->avail_dir_pq_pairs);
> -
> -   for (i = 0; i < DLB2_NUM_COS_DOMAINS; i++)
> -   dlb2_list_init_head(&domain->used_ldb_ports[i]);
> -   for (i = 0; i < DLB2_NUM_COS_DOMAINS; i++)
> -   dlb2_list_init_head(&domain->avail_ldb_ports[i]);
> -}
> -
>  void dlb2_hw_enable_sparse_dir_cq_mode(struct dlb2_hw *hw)
>  {
> union dlb2_chp_cfg_chp_csr_ctrl r0;
> @@ -70,636 +55,6 @@ void dlb2_hw_enable_sparse_ldb_cq_mode(struct dlb2_hw *hw)
> DLB2_CSR_WR(hw, DLB2_CHP_CFG_CHP_CSR_CTRL, r0.val);
>  }
>
> -static void dlb2_configure_domain_credits(struct dlb2_hw *hw,
> - struct dlb2_hw_domain *domain)
> -{
> -   union dlb2_chp_cfg_ldb_vas_crd r0 = { {0} };
> -   union dlb2_chp_cfg_dir_vas_crd r1 = { {0} };
> -
> -   r0.field.count = domain->num_ldb_credits;
> -
> -   DLB2_CSR_WR(hw, DLB2_CHP_CFG_LDB_VAS_CRD(domain->id.phys_id), r0.val);
> -
> -   r1.field.count = domain->num_dir_credits;
> -
> -   DLB2_CSR_WR(hw, DLB2_CHP_CFG_DIR_VAS_CRD(domain->id.phys_id), r1.val);
> -}
> -
> -static struct dlb2_ldb_port *
> -dlb2_get_next_ldb_port(struct dlb2_hw *hw,
> -  struct dlb2_function_resources *rsrcs,
> -  u32 domain_id,
> -  u32 cos_id)
> -{
> -   struct dlb2_list_entry *iter;
> -   struct dlb2_ldb_port *port;
> -   RTE_SET_USED(iter);
> -   /*
> -* To reduce the odds of consecutive load-balanced ports mapping to 
> the
> -* same queue(s), the driver attempts to allocate ports whose 
> neighbors
> -* are owned by a different domain.
> -*/
> -   DLB2_FUNC_LIST_FOR(rsrcs->avail_ldb_ports[cos_id], port, iter) {
> -   u32 next, prev;
> -   u32 phys_id;
> -
> -   phys_id = port->id.phys_id;
> -   next = phys_id + 1;
> -   prev = phys_id - 1;
> -
> -   if (phys_id == DLB2_MAX_NUM_LDB_PORTS - 1)
> -   next = 0;
> -   if (phys_id == 0)
> -   prev = DLB2_MAX_NUM_LDB_PORTS - 1;
> -
> -   if (!hw->rsrcs.ldb_ports

Re: [dpdk-dev] [PATCH v2 09/27] event/dlb2: add v2.5 create dir queue

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 1:08 AM Timothy McDaniel
 wrote:
>
> Updated low level hardware functions to account for new
> register map and hardware access macros.
>
> Signed-off-by: Timothy McDaniel 
> ---
>  drivers/event/dlb2/pf/base/dlb2_resource.c| 213 --
>  .../event/dlb2/pf/base/dlb2_resource_new.c| 201 +

All changes to this file, please take the git rename path to reduce the diff.


>  2 files changed, 201 insertions(+), 213 deletions(-)
>
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.c 
> b/drivers/event/dlb2/pf/base/dlb2_resource.c
> index 70c52e908..362deadfe 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.c
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.c
> @@ -1225,219 +1225,6 @@ dlb2_get_domain_used_dir_pq(struct dlb2_hw *hw,
> return NULL;
>  }
>
> -static void dlb2_configure_dir_queue(struct dlb2_hw *hw,
> -struct dlb2_hw_domain *domain,
> -struct dlb2_dir_pq_pair *queue,
> -struct dlb2_create_dir_queue_args *args,
> -bool vdev_req,
> -unsigned int vdev_id)
> -{
> -   union dlb2_sys_dir_vasqid_v r0 = { {0} };
> -   union dlb2_sys_dir_qid_its r1 = { {0} };
> -   union dlb2_lsp_qid_dir_depth_thrsh r2 = { {0} };
> -   union dlb2_sys_dir_qid_v r5 = { {0} };
> -
> -   unsigned int offs;
> -
> -   /* QID write permissions are turned on when the domain is started */
> -   r0.field.vasqid_v = 0;
> -
> -   offs = domain->id.phys_id * DLB2_MAX_NUM_DIR_QUEUES(hw->ver) +
> -   queue->id.phys_id;
> -
> -   DLB2_CSR_WR(hw, DLB2_SYS_DIR_VASQID_V(offs), r0.val);
> -
> -   /* Don't timestamp QEs that pass through this queue */
> -   r1.field.qid_its = 0;
> -
> -   DLB2_CSR_WR(hw,
> -   DLB2_SYS_DIR_QID_ITS(queue->id.phys_id),
> -   r1.val);
> -
> -   r2.field.thresh = args->depth_threshold;
> -
> -   DLB2_CSR_WR(hw,
> -   DLB2_LSP_QID_DIR_DEPTH_THRSH(queue->id.phys_id),
> -   r2.val);
> -
> -   if (vdev_req) {
> -   union dlb2_sys_vf_dir_vqid_v r3 = { {0} };
> -   union dlb2_sys_vf_dir_vqid2qid r4 = { {0} };
> -
> -   offs = vdev_id * DLB2_MAX_NUM_DIR_QUEUES(hw->ver)
> -   + queue->id.virt_id;
> -
> -   r3.field.vqid_v = 1;
> -
> -   DLB2_CSR_WR(hw, DLB2_SYS_VF_DIR_VQID_V(offs), r3.val);
> -
> -   r4.field.qid = queue->id.phys_id;
> -
> -   DLB2_CSR_WR(hw, DLB2_SYS_VF_DIR_VQID2QID(offs), r4.val);
> -   }
> -
> -   r5.field.qid_v = 1;
> -
> -   DLB2_CSR_WR(hw, DLB2_SYS_DIR_QID_V(queue->id.phys_id), r5.val);
> -
> -   queue->queue_configured = true;
> -}
> -
> -static void
> -dlb2_log_create_dir_queue_args(struct dlb2_hw *hw,
> -  u32 domain_id,
> -  struct dlb2_create_dir_queue_args *args,
> -  bool vdev_req,
> -  unsigned int vdev_id)
> -{
> -   DLB2_HW_DBG(hw, "DLB2 create directed queue arguments:\n");
> -   if (vdev_req)
> -   DLB2_HW_DBG(hw, "(Request from vdev %d)\n", vdev_id);
> -   DLB2_HW_DBG(hw, "\tDomain ID: %d\n", domain_id);
> -   DLB2_HW_DBG(hw, "\tPort ID:   %d\n", args->port_id);
> -}
> -
> -static int
> -dlb2_verify_create_dir_queue_args(struct dlb2_hw *hw,
> - u32 domain_id,
> - struct dlb2_create_dir_queue_args *args,
> - struct dlb2_cmd_response *resp,
> - bool vdev_req,
> - unsigned int vdev_id)
> -{
> -   struct dlb2_hw_domain *domain;
> -
> -   domain = dlb2_get_domain_from_id(hw, domain_id, vdev_req, vdev_id);
> -
> -   if (domain == NULL) {
> -   resp->status = DLB2_ST_INVALID_DOMAIN_ID;
> -   return -EINVAL;
> -   }
> -
> -   if (!domain->configured) {
> -   resp->status = DLB2_ST_DOMAIN_NOT_CONFIGURED;
> -   return -EINVAL;
> -   }
> -
> -   if (domain->started) {
> -   resp->status = DLB2_ST_DOMAIN_STARTED;
> -   return -EINVAL;
> -   }
> -
> -   /*
> -* If the user claims the port is already configured, validate the 
> port
> -* ID, its domain, and whether the port is configured.
> -*/
> -   if (args->port_id != -1) {
> -   struct dlb2_dir_pq_pair *port;
> -
> -   port = dlb2_get_domain_used_dir_pq(hw,
> -  args->port_id,
> -  vdev_req,
> -  domain);
> -
> -   if (port == NULL || port

Re: [dpdk-dev] [PATCH v2 20/27] event/dlb2: move dlb_resource_new.c to dlb_resource.c

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 1:09 AM Timothy McDaniel
 wrote:
>
> The file dlb_resource_new.c now contains all of the low level
> functions required to support both DLB v2.0 and DLB v2.5, and
> the original file (dlb_resource.c) was removed in the previous
> commit, so rename dlb_resource_new.c to dlb_resource.c, and
> update the meson build file so that the new file is built.
>
> Signed-off-by: Timothy McDaniel 

Please squash 19 and 20 and have comments like "event/dlb2: switch
over to new implementation" or so.


> ---
>  drivers/event/dlb2/meson.build  | 2 +-
>  .../event/dlb2/pf/base/{dlb2_resource_new.c => dlb2_resource.c} | 0
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename drivers/event/dlb2/pf/base/{dlb2_resource_new.c => dlb2_resource.c} 
> (100%)
>
> diff --git a/drivers/event/dlb2/meson.build b/drivers/event/dlb2/meson.build
> index d8cfd377f..f22638b8e 100644
> --- a/drivers/event/dlb2/meson.build
> +++ b/drivers/event/dlb2/meson.build
> @@ -13,7 +13,7 @@ sources = files('dlb2.c',
> 'dlb2_xstats.c',
> 'pf/dlb2_main.c',
> 'pf/dlb2_pf.c',
> -   'pf/base/dlb2_resource_new.c',
> +   'pf/base/dlb2_resource.c',
> 'rte_pmd_dlb2.c',
> 'dlb2_selftest.c'
>  )
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource_new.c 
> b/drivers/event/dlb2/pf/base/dlb2_resource.c
> similarity index 100%
> rename from drivers/event/dlb2/pf/base/dlb2_resource_new.c
> rename to drivers/event/dlb2/pf/base/dlb2_resource.c
> --
> 2.23.0
>


[dpdk-dev] [PATCH] maintainer: email update maintainer

2021-04-03 Thread Liang Ma
I would like to change my email to personal email address.

Signed-off-by: Liang Ma 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0ec558854..bca79c52b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1237,7 +1237,7 @@ F: drivers/event/dsw/
 F: doc/guides/eventdevs/dsw.rst
 
 Software OPDL Eventdev PMD
-M: Liang Ma 
+M: Liang Ma 
 M: Peter Mccarthy 
 F: drivers/event/opdl/
 F: doc/guides/eventdevs/opdl.rst
-- 
2.17.1



Re: [dpdk-dev] [PATCH v2 27/27] event/dlb2: Change device name to dlb_event

2021-04-03 Thread Jerin Jacob
On Wed, Mar 31, 2021 at 1:09 AM Timothy McDaniel
 wrote:
>
> Updated eventdev device name to be dlb_event instead of
> dlb2_event.  The new name will be used for all versions
> of the DLB hardware. This change required corresponding changes
> to the the directory name that contains the PMD, as well
> as the documentation files, build infrastructure, and PMD
> specific APIs.
>
> Signed-off-by: Timothy McDaniel 


# Change the patch subject to event/dlb:
# Also, I can still still see[1],  doc/guides/eventdevs/dlb.rst and
doc/guides/eventdevs/dlb2.rst.
Let have only one .rst file for one driver.
# Please check the documentation carefully, I see the example in vdev
arguments still messed up with dlb2 and dlb1.
Please check and correct as needed.


[1]
[for-main]dell[dpdk-next-eventdev] $ git diff HEAD~27 --stat
 MAINTAINERS  |6 +-
 app/test/test_eventdev.c |6 +-
 config/rte_config.h  |   11 +-
 doc/api/doxy-api-index.md|2 +-
 doc/api/doxy-api.conf.in |2 +-
 doc/guides/eventdevs/dlb.rst |  390

 doc/guides/eventdevs/dlb.rst  |   75 ++-
 doc/guides/eventdevs/index.rst   |2 +-
 doc/guides/rel_notes/release_21_05.rst   |5 +
 drivers/event/{dlb2 => dlb}/dlb2.c   |  451
-

> +/* DLB defines */
> +#define RTE_LIBRTE_PMD_DLB_POLL_INTERVAL 1000
> +#undef RTE_LIBRTE_PMD_DLB_QUELL_STATS
> +#define RTE_LIBRTE_PMD_DLB_SW_CREDIT_QUANTA 32
> +#define RTE_LIBRTE_PMD_DLB_DEFAULT_DEPTH_THRESH 256


PLEASE MOVE THIS ALL TO RUNTIME. If it not used in fastpath.


> +Deferred Scheduling
> +~~~
> +
> +The DLB2 PMD's default behavior for managing a CQ is to "pop" the CQ once per
> +dequeued event before returning from rte_event_dequeue_burst(). This frees 
> the
> +corresponding entries in the CQ, which enables the DLB2 to schedule more 
> events
> +to it.
> +
> +To support applications seeking finer-grained scheduling control -- for 
> example
> +deferring scheduling to get the best possible priority scheduling and
> +load-balancing -- the PMD supports a deferred scheduling mode. In this mode,
> +the CQ entry is not popped until the *subsequent* rte_event_dequeue_burst()
> +call. This mode only applies to load-balanced event ports with dequeue depth 
> of
> +1.
> +
> +To enable deferred scheduling, use the defer_sched vdev argument like so:
> +
> +.. code-block:: console
> +
> +   --vdev=dlb1_event,defer_sched=on

It should be dlb_event. Right?

> +
> +Atomic Inflights Allocation
> +~~~
> +
> +In the last stage prior to scheduling an atomic event to a CQ, DLB2 holds the
> +inflight event in a temporary buffer that is divided among load-balanced
> +queues. If a queue's atomic buffer storage fills up, this can result in
> +head-of-line-blocking. For example:
> +
> +- An LDB queue allocated N atomic buffer entries
> +- All N entries are filled with events from flow X, which is pinned to CQ 0.
> +
> +Until CQ 0 releases 1+ events, no other atomic flows for that LDB queue can 
> be
> +scheduled. The likelihood of this case depends on the eventdev configuration,
> +traffic behavior, event processing latency, potential for a worker to be
> +interrupted or otherwise delayed, etc.
> +
> +By default, the PMD allocates 16 buffer entries for each load-balanced queue,
> +which provides an even division across all 128 queues but potentially wastes
> +buffer space (e.g. if not all queues are used, or aren't used for atomic
> +scheduling).
> +
> +The PMD provides a dev arg to override the default per-queue allocation. To
> +increase a vdev's per-queue atomic-inflight allocation to (for example) 64:
> +
> +.. code-block:: console
> +
> +   --vdev=dlb1_event,atm_inflights=64

It should be dlb_event. Right?

> +
> +QID Depth Threshold
> +~~~
> +
> +DLB2 supports setting and tracking queue depth thresholds. Hardware uses
> +the thresholds to track how full a queue is compared to its threshold.
> +Four buckets are used
> +
> +- Less than or equal to 50% of queue depth threshold
> +- Greater than 50%, but less than or equal to 75% of depth threshold
> +- Greater than 75%, but less than or equal to 100% of depth threshold
> +- Greater than 100% of depth thresholds
> +
> +Per queue threshold metrics are tracked in the DLB2 xstats, and are also
> +returned in the impl_opaque field of each received event.
> +
> +The per qid threshold can be specified as part of the device args, and
> +can be applied to all queue, a range of queues, or a single queue, as
> +shown below.
> +
> +.. code-block:: console
> +
> +   --vdev=dlb2_event,qid_depth_thresh=all:
> +   --vdev=dlb2_event,qid_depth_thresh=qidA-qidB:
> +   --vdev=dlb2_event,qid_depth_t

Re: [dpdk-dev] [PATCH v4 3/3] test/event_crypto: use crypto adapter enqueue API

2021-04-03 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Shijith Thotton 
> Sent: Friday, April 2, 2021 10:31 PM
> To: dev@dpdk.org
> Cc: Shijith Thotton ; tho...@monjalon.net;
> jer...@marvell.com; Gujjar, Abhinandan S ;
> hemant.agra...@nxp.com; nipun.gu...@nxp.com;
> sachin.sax...@oss.nxp.com; ano...@marvell.com; ma...@nvidia.com;
> Zhang, Roy Fan ; g.si...@nxp.com; Carrillo, Erik
> G ; Jayatheerthan, Jay
> ; pbhagavat...@marvell.com; Van Haaren,
> Harry ; Akhil Goyal 
> Subject: [PATCH v4 3/3] test/event_crypto: use crypto adapter enqueue API
> 
> Use rte_event_crypto_adapter_enqueue() API to enqueue events to crypto
> adapter if forward mode is supported in driver.
> 
> Signed-off-by: Shijith Thotton 
> ---
>  app/test/test_event_crypto_adapter.c | 29 +++-
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/app/test/test_event_crypto_adapter.c
> b/app/test/test_event_crypto_adapter.c
> index 335211cd8..2b07f1582 100644
> --- a/app/test/test_event_crypto_adapter.c
> +++ b/app/test/test_event_crypto_adapter.c
> @@ -64,6 +64,7 @@ struct event_crypto_adapter_test_params {
>   struct rte_mempool *session_priv_mpool;
>   struct rte_cryptodev_config *config;
>   uint8_t crypto_event_port_id;
> + uint8_t internal_port_op_fwd;
>  };
> 
>  struct rte_event response_info = {
> @@ -110,9 +111,12 @@ send_recv_ev(struct rte_event *ev)
>   struct rte_event recv_ev;
>   int ret;
> 
> - ret = rte_event_enqueue_burst(evdev, TEST_APP_PORT_ID, ev,
> NUM);
> - TEST_ASSERT_EQUAL(ret, NUM,
> -   "Failed to send event to crypto adapter\n");
> + if (params.internal_port_op_fwd)
> + ret = rte_event_crypto_adapter_enqueue(evdev,
> TEST_APP_PORT_ID,
> +ev, NUM);
> + else
> + ret = rte_event_enqueue_burst(evdev,
> TEST_APP_PORT_ID, ev, NUM);
> + TEST_ASSERT_EQUAL(ret, NUM, "Failed to send event to crypto
> +adapter\n");
> 
>   while (rte_event_dequeue_burst(evdev,
>   TEST_APP_PORT_ID, &recv_ev, NUM, 0) == 0) @@ -
> 741,6 +745,11 @@ configure_event_crypto_adapter(enum
> rte_event_crypto_adapter_mode mode)
>   ret = rte_event_crypto_adapter_caps_get(evdev, TEST_CDEV_ID,
> &cap);
>   TEST_ASSERT_SUCCESS(ret, "Failed to get adapter capabilities\n");
> 
> + if (cap &
> RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD)
> + params.internal_port_op_fwd = 1;
> + else
> + params.internal_port_op_fwd = 0;
> +
There is a check at line 760 for FWD mode. Can't this be set there?

>   /* Skip mode and capability mismatch check for SW eventdev */
>   if (!(cap &
> RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_NEW) &&
>   !(cap &
> RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD) && @@ -
> 771,9 +780,11 @@ configure_event_crypto_adapter(enum
> rte_event_crypto_adapter_mode mode)
> 
>   TEST_ASSERT_SUCCESS(ret, "Failed to add queue pair\n");
> 
> - ret =
> rte_event_crypto_adapter_event_port_get(TEST_ADAPTER_ID,
> - ¶ms.crypto_event_port_id);
> - TEST_ASSERT_SUCCESS(ret, "Failed to get event port\n");
> + if (!params.internal_port_op_fwd) {
> + ret =
> rte_event_crypto_adapter_event_port_get(TEST_ADAPTER_ID,
> +
>   ¶ms.crypto_event_port_id);
> + TEST_ASSERT_SUCCESS(ret, "Failed to get event port\n");
> + }
> 
>   return TEST_SUCCESS;
>  }
> @@ -809,15 +820,15 @@ test_crypto_adapter_conf(enum
> rte_event_crypto_adapter_mode mode)
> 
>   if (!crypto_adapter_setup_done) {
>   ret = configure_event_crypto_adapter(mode);
> - if (!ret) {
> + if (ret)
> + return ret;
> + if (!params.internal_port_op_fwd) {
>   qid = TEST_CRYPTO_EV_QUEUE_ID;
>   ret = rte_event_port_link(evdev,
>   params.crypto_event_port_id, &qid, NULL,
> 1);
>   TEST_ASSERT(ret >= 0, "Failed to link queue %d "
>   "port=%u\n", qid,
>   params.crypto_event_port_id);
> - } else {
> - return ret;
>   }
>   crypto_adapter_setup_done = 1;
>   }
> --
> 2.25.1



Re: [dpdk-dev] [PATCH v3 00/52] Add Marvell CNXK common driver

2021-04-03 Thread Jerin Jacob
On Thu, Apr 1, 2021 at 6:08 PM Nithin Dabilpuram
 wrote:
>
> This patchset adds initial support for common code for
> Marvell CN10K SoC. Based on this common 'cnxk' driver, new PMD's
> such as 'net/cnxk', 'mempool/cnxk', 'event/cnxk' etc, will be added
> later on.
>
> Initially 'cnxk' drivers will only support Marvell CN106XX SoC. In future,
> when code is ready, CN9K/octeontx2 will also be supported by the same set
> of drivers and 'common/octeontx2' and its associated drivers will be
> deprecated.


# Add a new item in doc/guides/rel_notes/release_21_05.rst to say
"added support for Marvell CN10K SoC drivers"
- Add one line to describe the CN10K/Octeon 10 family
- Add a sentence similar like following drivers are added or so and
include "common/cnxk/ update here. Other subsequent drivers can update
this line for new drivers like
ethdev, mempool, eventdev etc as when the patches are added.

# Please fix the following valid issues.

[for-dpdk-main]dell[dpdk-next-net-for-dpdk-main] $
./devtools/check-git-log.sh -n 52
Wrong headline format:
common/cnxk: add support for rss action in rte_flow
Wrong headline prefix:
common/cnxk: add build infrastructre and HW definition
Wrong headline case:
"common/cnxk: add support for rss action in
rte_flow": rss --> RSS

Invalid patch(es) found - checked 52 patches
[for-dpdk-main]dell[dpdk-next-net-for-dpdk-main] $


[for-dpdk-main]dell[dpdk-next-net-for-dpdk-main] $
./devtools/checkpatches.sh -n 52

### doc: add Marvell CNXK platform guide

WARNING:REPEATED_WORD: Possible repeated word: 'to'
#429: FILE: doc/guides/platform/cnxk.rst:393:
+Packets sent to to unicast DMAC: 0

total: 0 errors, 1 warnings, 5823 lines checked


### common/cnxk: add base nix support

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
parentheses
#583: FILE: drivers/common/cnxk/roc_nix_priv.h:11:
+#define NIX_CQ_ALIGN(uint16_t)512

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
parentheses
#584: FILE: drivers/common/cnxk/roc_nix_priv.h:12:
+#define NIX_MAX_SQB (uint16_t)512

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
parentheses
#587: FILE: drivers/common/cnxk/roc_nix_priv.h:15:
+#define NIX_SQB_LIST_SPACE   (uint16_t)2

ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
parentheses
#588: FILE: drivers/common/cnxk/roc_nix_priv.h:16:
+#define NIX_SQB_LOWER_THRESH (uint16_t)70

WARNING:UNNECESSARY_BREAK: break is not useful after a break
#752: FILE: drivers/common/cnxk/roc_utils.c:44:
+   break;
+   break;

total: 4 errors, 1 warnings, 731 lines checked

### common/cnxk: add support for nix extended stats

WARNING:UNNECESSARY_INT: Prefer 'unsigned long' over 'unsigned long
int' as the int is unnecessary
#175: FILE: drivers/common/cnxk/roc_nix_stats.c:348:
+   unsigned long int i, count = 0;

WARNING:UNNECESSARY_INT: Prefer 'unsigned long' over 'unsigned long
int' as the int is unnecessary
#431: FILE: drivers/common/cnxk/roc_nix_xstats.h:187:
+static inline unsigned long int

WARNING:UNNECESSARY_INT: Prefer 'unsigned long' over 'unsigned long
int' as the int is unnecessary
#440: FILE: drivers/common/cnxk/roc_nix_xstats.h:196:
+static inline unsigned long int

>
> Ashwin Sekhar T K (8):
>   common/cnxk: add base npa device support
>   common/cnxk: add npa irq support
>   common/cnxk: add npa debug support
>   common/cnxk: add npa pool HW ops
>   common/cnxk: add npa bulk alloc/free support
>   common/cnxk: add npa performance counter support
>   common/cnxk: add npa batch alloc/free support
>   common/cnxk: add npa lf init/fini callback support
>
> Jerin Jacob (14):
>   common/cnxk: add build infrastructre and HW definition
>   common/cnxk: add model init and IO handling API
>   common/cnxk: add interrupt helper API
>   common/cnxk: add mbox request and response definitions
>   common/cnxk: add mailbox base infra
>   common/cnxk: add base device class
>   common/cnxk: add VF support to base device class
>   common/cnxk: add base nix support
>   common/cnxk: add nix irq support
>   common/cnxk: add nix Rx queue management API
>   common/cnxk: add nix Tx queue management API
>   common/cnxk: add nix RSS support
>   common/cnxk: add nix stats support
>   common/cnxk: add nix debug dump support
>
> Kiran Kumar K (5):
>   common/cnxk: add npc support
>   common/cnxk: add npc helper API
>   common/cnxk: add mcam utility API
>   common/cnxk: add npc parsing API
>   common/cnxk: add npc init and fini support
>
> Nithin Dabilpuram (8):
>   doc: add Marvell CNXK platform guide
>   common/cnxk: add nix traffic management base support
>   common/cnxk: add nix tm support to add/delete node
>   common/cnxk: add nix tm helper to alloc and free resource
>   common/cnxk: add nix tm hierarchy enable/disable
>   common/cnxk: add nix tm support for internal hierarchy
>   common/cnxk: add nix tm dynamic update support
>   common/cnxk: add nix tm

Re: [dpdk-dev] [PATCH v4 1/3] eventdev: introduce crypto adapter enqueue API

2021-04-03 Thread Gujjar, Abhinandan S



> -Original Message-
> From: Shijith Thotton 
> Sent: Friday, April 2, 2021 10:31 PM
> To: dev@dpdk.org
> Cc: Akhil Goyal ; tho...@monjalon.net;
> jer...@marvell.com; Gujjar, Abhinandan S ;
> hemant.agra...@nxp.com; nipun.gu...@nxp.com;
> sachin.sax...@oss.nxp.com; ano...@marvell.com; ma...@nvidia.com;
> Zhang, Roy Fan ; g.si...@nxp.com; Carrillo, Erik
> G ; Jayatheerthan, Jay
> ; pbhagavat...@marvell.com; Van Haaren,
> Harry ; Shijith Thotton
> 
> Subject: [PATCH v4 1/3] eventdev: introduce crypto adapter enqueue API
> 
> From: Akhil Goyal 
> 
> In case an event from a previous stage is required to be forwarded to a
> crypto adapter and PMD supports internal event port in crypto adapter,
> exposed via capability
> RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD, we do not
> have a way to check in the API rte_event_enqueue_burst(), whether it is for
> crypto adapter or for eth tx adapter.
I may be missing something here. Crypto adapter is an atomic stage has a port 
which is setup during the adapter configuration.
So, application enqueuing events will end up sending to the crypto adapter (As 
the adapter dequeues from a specific port).
Still wondering why there is requirement for new API.

> 
> Hence we need a new API similar to rte_event_eth_tx_adapter_enqueue(),
> which can send to a crypto adapter.
> 
> Note that RTE_EVENT_TYPE_* cannot be used to make that decision, as it is
> meant for event source and not event destination.
> And event port designated for crypto adapter is designed to be used for
> OP_NEW mode.
> 
> Hence, in order to support an event PMD which has an internal event port in
> crypto adapter (RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode),
> exposed via capability
> RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD,
> application should use rte_event_crypto_adapter_enqueue() API to
> enqueue events.
> 
> When internal port is not available(RTE_EVENT_CRYPTO_ADAPTER_OP_NEW
> mode), application can use API rte_event_enqueue_burst() as it was doing
> earlier, i.e. retrieve event port used by crypto adapter and bind its event
> queues to that port and enqueue events using the API
> rte_event_enqueue_burst().
> 
> Signed-off-by: Akhil Goyal 
> ---
>  .../prog_guide/event_crypto_adapter.rst   | 69 ---
>  doc/guides/rel_notes/release_21_05.rst|  6 ++
>  lib/librte_eventdev/eventdev_trace_points.c   |  3 +
>  .../rte_event_crypto_adapter.h| 63 +
>  lib/librte_eventdev/rte_eventdev.c| 10 +++
>  lib/librte_eventdev/rte_eventdev.h|  8 ++-
>  lib/librte_eventdev/rte_eventdev_trace_fp.h   | 10 +++
>  lib/librte_eventdev/version.map   |  3 +
>  8 files changed, 145 insertions(+), 27 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/event_crypto_adapter.rst
> b/doc/guides/prog_guide/event_crypto_adapter.rst
> index 1e3eb7139..4fb5c688e 100644
> --- a/doc/guides/prog_guide/event_crypto_adapter.rst
> +++ b/doc/guides/prog_guide/event_crypto_adapter.rst
> @@ -55,21 +55,22 @@ which is needed to enqueue an event after the
> crypto operation is completed.
>  RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode
> 
> 
> -In the RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD mode, if HW supports
> -RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD capability
> the application -can directly submit the crypto operations to the cryptodev.
> -If not, application retrieves crypto adapter's event port using
> -rte_event_crypto_adapter_event_port_get() API. Then, links its event -
> queue to this port and starts enqueuing crypto operations as events -to the
> eventdev. The adapter then dequeues the events and submits the -crypto
> operations to the cryptodev. After the crypto completions, the -adapter
> enqueues events to the event device.
> -Application can use this mode, when ingress packet ordering is needed.
> -In this mode, events dequeued from the adapter will be treated as -
> forwarded events. The application needs to specify the cryptodev ID -and
> queue pair ID (request information) needed to enqueue a crypto -operation
> in addition to the event information (response information) -needed to
> enqueue an event after the crypto operation has completed.
> +In the ``RTE_EVENT_CRYPTO_ADAPTER_OP_FORWARD`` mode, if the event
> PMD
> +and crypto PMD supports internal event port
> +(``RTE_EVENT_CRYPTO_ADAPTER_CAP_INTERNAL_PORT_OP_FWD``), the
> +application should use ``rte_event_crypto_adapter_enqueue()`` API to
> +enqueue crypto operations as events to crypto adapter. If not,
> +application retrieves crypto adapter's event port using
> +``rte_event_crypto_adapter_event_port_get()`` API, links its event
> +queue to this port and starts enqueuing crypto operations as events to
> +eventdev using ``rte_event_enqueue_burst()``. The adapter then
> dequeues
> +the events and submits the crypto operations to the cryptodev. After
> +the crypto operation is complete, the adapter enqueues events to the

[dpdk-dev] [PATCH v2 00/11] Add Marvell CNXK mempool driver

2021-04-03 Thread Ashwin Sekhar T K
This patchset adds the mempool/cnxk driver which provides the support for the
integrated mempool device found in Marvell CN10K SoC.

The code includes mempool driver functionality for Marvell CN9K SoC as well,
but right now it is not enabled. The future plan is to deprecate existing
mempool/octeontx2 driver once the 'CNXK' drivers are feature complete for
Marvell CN9K SoC.

Depends-on: series-16059 ("Add Marvell CNXK common driver")

v2:
 - Addressed Jerin's comments in v1.
 - Splitted mempool ops for cn10k/cn9k into multiple commits.
 - Added more description in the commit messages.
 - Moved MAINTAINERS and doc change to first commit.
 - Moved doc changes into respective commits implementing the change.

Ashwin Sekhar T K (11):
  mempool/cnxk: add build infra and doc
  mempool/cnxk: add device probe/remove
  mempool/cnxk: add generic ops
  mempool/cnxk: register lf init/fini callbacks
  mempool/cnxk: add cn9k mempool ops
  mempool/cnxk: add cn9k optimized mempool enqueue/dequeue
  mempool/cnxk: add cn10k mempool ops
  mempool/cnxk: add batch op init
  mempool/cnxk: add cn10k batch enqueue op
  mempool/cnxk: add cn10k get count op
  mempool/cnxk: add cn10k batch dequeue op

 MAINTAINERS  |   6 +
 doc/guides/mempool/cnxk.rst  |  91 +++
 doc/guides/mempool/index.rst |   1 +
 doc/guides/platform/cnxk.rst |   3 +
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 294 +++
 drivers/mempool/cnxk/cn9k_mempool_ops.c  |  89 +++
 drivers/mempool/cnxk/cnxk_mempool.c  | 201 
 drivers/mempool/cnxk/cnxk_mempool.h  |  29 +++
 drivers/mempool/cnxk/cnxk_mempool_ops.c  | 199 +++
 drivers/mempool/cnxk/meson.build |  16 ++
 drivers/mempool/cnxk/version.map |   3 +
 drivers/mempool/meson.build  |   3 +-
 12 files changed, 934 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/mempool/cnxk.rst
 create mode 100644 drivers/mempool/cnxk/cn10k_mempool_ops.c
 create mode 100644 drivers/mempool/cnxk/cn9k_mempool_ops.c
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool.c
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool.h
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool_ops.c
 create mode 100644 drivers/mempool/cnxk/meson.build
 create mode 100644 drivers/mempool/cnxk/version.map

-- 
2.31.0



[dpdk-dev] [PATCH v2 02/11] mempool/cnxk: add device probe/remove

2021-04-03 Thread Ashwin Sekhar T K
Add the implementation for CNXk mempool device
probe and remove.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Ashwin Sekhar T K 
---
 doc/guides/mempool/cnxk.rst |  23 +
 drivers/mempool/cnxk/cnxk_mempool.c | 131 +++-
 2 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/doc/guides/mempool/cnxk.rst b/doc/guides/mempool/cnxk.rst
index e72a77c361..907c19c841 100644
--- a/doc/guides/mempool/cnxk.rst
+++ b/doc/guides/mempool/cnxk.rst
@@ -30,6 +30,29 @@ Pre-Installation Configuration
 --
 
 
+Runtime Config Options
+~~
+
+- ``Maximum number of mempools per application`` (default ``128``)
+
+  The maximum number of mempools per application needs to be configured on
+  HW during mempool driver initialization. HW can support up to 1M mempools,
+  Since each mempool costs set of HW resources, the ``max_pools`` ``devargs``
+  parameter is being introduced to configure the number of mempools required
+  for the application.
+  For example::
+
+-a 0002:02:00.0,max_pools=512
+
+  With the above configuration, the driver will set up only 512 mempools for
+  the given application to save HW resources.
+
+.. note::
+
+   Since this configuration is per application, the end user needs to
+   provide ``max_pools`` parameter to the first PCIe device probed by the given
+   application.
+
 Debugging Options
 ~
 
diff --git a/drivers/mempool/cnxk/cnxk_mempool.c 
b/drivers/mempool/cnxk/cnxk_mempool.c
index 947078c052..703d15be42 100644
--- a/drivers/mempool/cnxk/cnxk_mempool.c
+++ b/drivers/mempool/cnxk/cnxk_mempool.c
@@ -15,21 +15,142 @@
 
 #include "roc_api.h"
 
+#define CNXK_NPA_DEV_NAME   RTE_STR(cnxk_npa_dev_)
+#define CNXK_NPA_DEV_NAME_LEN   (sizeof(CNXK_NPA_DEV_NAME) + PCI_PRI_STR_SIZE)
+#define CNXK_NPA_MAX_POOLS_PARAM "max_pools"
+
+static inline uint32_t
+npa_aura_size_to_u32(uint8_t val)
+{
+   if (val == NPA_AURA_SZ_0)
+   return 128;
+   if (val >= NPA_AURA_SZ_MAX)
+   return BIT_ULL(20);
+
+   return 1 << (val + 6);
+}
+
 static int
-npa_remove(struct rte_pci_device *pci_dev)
+parse_max_pools(const char *key, const char *value, void *extra_args)
 {
-   RTE_SET_USED(pci_dev);
+   RTE_SET_USED(key);
+   uint32_t val;
 
+   val = atoi(value);
+   if (val < npa_aura_size_to_u32(NPA_AURA_SZ_128))
+   val = 128;
+   if (val > npa_aura_size_to_u32(NPA_AURA_SZ_1M))
+   val = BIT_ULL(20);
+
+   *(uint8_t *)extra_args = rte_log2_u32(val) - 6;
return 0;
 }
 
+static inline uint8_t
+parse_aura_size(struct rte_devargs *devargs)
+{
+   uint8_t aura_sz = NPA_AURA_SZ_128;
+   struct rte_kvargs *kvlist;
+
+   if (devargs == NULL)
+   goto exit;
+   kvlist = rte_kvargs_parse(devargs->args, NULL);
+   if (kvlist == NULL)
+   goto exit;
+
+   rte_kvargs_process(kvlist, CNXK_NPA_MAX_POOLS_PARAM, &parse_max_pools,
+  &aura_sz);
+   rte_kvargs_free(kvlist);
+exit:
+   return aura_sz;
+}
+
+static inline char *
+npa_dev_to_name(struct rte_pci_device *pci_dev, char *name)
+{
+   snprintf(name, CNXK_NPA_DEV_NAME_LEN, CNXK_NPA_DEV_NAME PCI_PRI_FMT,
+pci_dev->addr.domain, pci_dev->addr.bus, pci_dev->addr.devid,
+pci_dev->addr.function);
+
+   return name;
+}
+
+static int
+npa_init(struct rte_pci_device *pci_dev)
+{
+   char name[CNXK_NPA_DEV_NAME_LEN];
+   const struct rte_memzone *mz;
+   struct roc_npa *dev;
+   int rc;
+
+   rc = roc_plt_init();
+   if (rc < 0)
+   goto error;
+
+   rc = -ENOMEM;
+   mz = rte_memzone_reserve_aligned(npa_dev_to_name(pci_dev, name),
+sizeof(*dev), SOCKET_ID_ANY, 0,
+RTE_CACHE_LINE_SIZE);
+   if (mz == NULL)
+   goto error;
+
+   dev = mz->addr;
+   dev->pci_dev = pci_dev;
+
+   roc_idev_npa_maxpools_set(parse_aura_size(pci_dev->device.devargs));
+   rc = roc_npa_dev_init(dev);
+   if (rc)
+   goto mz_free;
+
+   return 0;
+
+mz_free:
+   rte_memzone_free(mz);
+error:
+   plt_err("failed to initialize npa device rc=%d", rc);
+   return rc;
+}
+
+static int
+npa_fini(struct rte_pci_device *pci_dev)
+{
+   char name[CNXK_NPA_DEV_NAME_LEN];
+   const struct rte_memzone *mz;
+   int rc;
+
+   mz = rte_memzone_lookup(npa_dev_to_name(pci_dev, name));
+   if (mz == NULL)
+   return -EINVAL;
+
+   rc = roc_npa_dev_fini(mz->addr);
+   if (rc) {
+   if (rc != -EAGAIN)
+   plt_err("Failed to remove npa dev, rc=%d", rc);
+   return rc;
+   }
+   rte_memzone_free(mz);
+
+   return 0;
+}
+
+static int
+npa_remove(struct rte_pci_device *pci_dev)
+{
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+  

[dpdk-dev] [PATCH v2 03/11] mempool/cnxk: add generic ops

2021-04-03 Thread Ashwin Sekhar T K
Add generic CNXk mempool ops which will enqueue/dequeue
from pool one element at a time.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cnxk_mempool.h |  26 
 drivers/mempool/cnxk/cnxk_mempool_ops.c | 171 
 drivers/mempool/cnxk/meson.build|   3 +-
 3 files changed, 199 insertions(+), 1 deletion(-)
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool.h
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool_ops.c

diff --git a/drivers/mempool/cnxk/cnxk_mempool.h 
b/drivers/mempool/cnxk/cnxk_mempool.h
new file mode 100644
index 00..099b7f6998
--- /dev/null
+++ b/drivers/mempool/cnxk/cnxk_mempool.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#ifndef _CNXK_MEMPOOL_H_
+#define _CNXK_MEMPOOL_H_
+
+#include 
+
+unsigned int cnxk_mempool_get_count(const struct rte_mempool *mp);
+ssize_t cnxk_mempool_calc_mem_size(const struct rte_mempool *mp,
+  uint32_t obj_num, uint32_t pg_shift,
+  size_t *min_chunk_size, size_t *align);
+int cnxk_mempool_populate(struct rte_mempool *mp, unsigned int max_objs,
+ void *vaddr, rte_iova_t iova, size_t len,
+ rte_mempool_populate_obj_cb_t *obj_cb,
+ void *obj_cb_arg);
+int cnxk_mempool_alloc(struct rte_mempool *mp);
+void cnxk_mempool_free(struct rte_mempool *mp);
+
+int __rte_hot cnxk_mempool_enq(struct rte_mempool *mp, void *const *obj_table,
+  unsigned int n);
+int __rte_hot cnxk_mempool_deq(struct rte_mempool *mp, void **obj_table,
+  unsigned int n);
+
+#endif
diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c 
b/drivers/mempool/cnxk/cnxk_mempool_ops.c
new file mode 100644
index 00..2ce1816c04
--- /dev/null
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include 
+
+#include "roc_api.h"
+#include "cnxk_mempool.h"
+
+int __rte_hot
+cnxk_mempool_enq(struct rte_mempool *mp, void *const *obj_table, unsigned int 
n)
+{
+   unsigned int index;
+
+   /* Ensure mbuf init changes are written before the free pointers
+* are enqueued to the stack.
+*/
+   rte_io_wmb();
+   for (index = 0; index < n; index++)
+   roc_npa_aura_op_free(mp->pool_id, 0,
+(uint64_t)obj_table[index]);
+
+   return 0;
+}
+
+int __rte_hot
+cnxk_mempool_deq(struct rte_mempool *mp, void **obj_table, unsigned int n)
+{
+   unsigned int index;
+   uint64_t obj;
+
+   for (index = 0; index < n; index++, obj_table++) {
+   int retry = 4;
+
+   /* Retry few times before failing */
+   do {
+   obj = roc_npa_aura_op_alloc(mp->pool_id, 0);
+   } while (retry-- && (obj == 0));
+
+   if (obj == 0) {
+   cnxk_mempool_enq(mp, obj_table - index, index);
+   return -ENOENT;
+   }
+   *obj_table = (void *)obj;
+   }
+
+   return 0;
+}
+
+unsigned int
+cnxk_mempool_get_count(const struct rte_mempool *mp)
+{
+   return (unsigned int)roc_npa_aura_op_available(mp->pool_id);
+}
+
+ssize_t
+cnxk_mempool_calc_mem_size(const struct rte_mempool *mp, uint32_t obj_num,
+  uint32_t pg_shift, size_t *min_chunk_size,
+  size_t *align)
+{
+   size_t total_elt_sz;
+
+   /* Need space for one more obj on each chunk to fulfill
+* alignment requirements.
+*/
+   total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+   return rte_mempool_op_calc_mem_size_helper(
+   mp, obj_num, pg_shift, total_elt_sz, min_chunk_size, align);
+}
+
+int
+cnxk_mempool_alloc(struct rte_mempool *mp)
+{
+   uint64_t aura_handle = 0;
+   struct npa_aura_s aura;
+   struct npa_pool_s pool;
+   uint32_t block_count;
+   size_t block_size;
+   int rc = -ERANGE;
+
+   block_size = mp->elt_size + mp->header_size + mp->trailer_size;
+   block_count = mp->size;
+   if (mp->header_size % ROC_ALIGN != 0) {
+   plt_err("Header size should be multiple of %dB", ROC_ALIGN);
+   goto error;
+   }
+
+   if (block_size % ROC_ALIGN != 0) {
+   plt_err("Block size should be multiple of %dB", ROC_ALIGN);
+   goto error;
+   }
+
+   memset(&aura, 0, sizeof(struct npa_aura_s));
+   memset(&pool, 0, sizeof(struct npa_pool_s));
+   pool.nat_align = 1;
+   pool.buf_offset = mp->header_size / ROC_ALIGN;
+
+   /* Use driver specific mp->pool_config to override aura config */
+   if (mp->pool_config != NULL)
+   memcpy(&aura, mp->pool_config, sizeof(struct npa_aura_s));
+
+ 

[dpdk-dev] [PATCH v2 01/11] mempool/cnxk: add build infra and doc

2021-04-03 Thread Ashwin Sekhar T K
Add the meson based build infrastructure for Marvell
CNXK mempool driver along with stub implementations
for mempool device probe.

Also add Marvell CNXK mempool base documentation.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Ashwin Sekhar T K 
---
 MAINTAINERS |  6 +++
 doc/guides/mempool/cnxk.rst | 55 
 doc/guides/mempool/index.rst|  1 +
 doc/guides/platform/cnxk.rst|  3 ++
 drivers/mempool/cnxk/cnxk_mempool.c | 78 +
 drivers/mempool/cnxk/meson.build| 13 +
 drivers/mempool/cnxk/version.map|  3 ++
 drivers/mempool/meson.build |  3 +-
 8 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/mempool/cnxk.rst
 create mode 100644 drivers/mempool/cnxk/cnxk_mempool.c
 create mode 100644 drivers/mempool/cnxk/meson.build
 create mode 100644 drivers/mempool/cnxk/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index c837516d14..bae8b93030 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -501,6 +501,12 @@ M: Artem V. Andreev 
 M: Andrew Rybchenko 
 F: drivers/mempool/bucket/
 
+Marvell cnxk
+M: Ashwin Sekhar T K 
+M: Pavan Nikhilesh 
+F: drivers/mempool/cnxk/
+F: doc/guides/mempool/cnxk.rst
+
 Marvell OCTEON TX2
 M: Jerin Jacob 
 M: Nithin Dabilpuram 
diff --git a/doc/guides/mempool/cnxk.rst b/doc/guides/mempool/cnxk.rst
new file mode 100644
index 00..e72a77c361
--- /dev/null
+++ b/doc/guides/mempool/cnxk.rst
@@ -0,0 +1,55 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(C) 2021 Marvell.
+
+CNXK NPA Mempool Driver
+
+
+The CNXK NPA PMD (**librte_mempool_cnxk**) provides mempool driver support for
+the integrated mempool device found in **Marvell OCTEON CN9K/CN10K** SoC 
family.
+
+More information about CNXK SoC can be found at `Marvell Official Website
+`_.
+
+Features
+
+
+CNXK NPA PMD supports:
+
+- Up to 128 NPA LFs
+- 1M Pools per LF
+- HW mempool manager
+- Ethdev Rx buffer allocation in HW to save CPU cycles in the Rx path.
+- Ethdev Tx buffer recycling in HW to save CPU cycles in the Tx path.
+
+Prerequisites and Compilation procedure
+---
+
+   See :doc:`../platform/cnxk` for setup information.
+
+Pre-Installation Configuration
+--
+
+
+Debugging Options
+~
+
+.. _table_cnxk_mempool_debug_options:
+
+.. table:: CNXK mempool debug options
+
+   +---++---+
+   | # | Component  | EAL log command   |
+   +===++===+
+   | 1 | NPA| --log-level='pmd\.mempool.cnxk,8' |
+   +---++---+
+
+Standalone mempool device
+~
+
+   The ``usertools/dpdk-devbind.py`` script shall enumerate all the mempool
+   devices available in the system. In order to avoid, the end user to bind the
+   mempool device prior to use ethdev and/or eventdev device, the respective
+   driver configures an NPA LF and attach to the first probed ethdev or 
eventdev
+   device. In case, if end user need to run mempool as a standalone device
+   (without ethdev or eventdev), end user needs to bind a mempool device using
+   ``usertools/dpdk-devbind.py``
diff --git a/doc/guides/mempool/index.rst b/doc/guides/mempool/index.rst
index a0e55467e6..ce53bc1ac7 100644
--- a/doc/guides/mempool/index.rst
+++ b/doc/guides/mempool/index.rst
@@ -11,6 +11,7 @@ application through the mempool API.
 :maxdepth: 2
 :numbered:
 
+cnxk
 octeontx
 octeontx2
 ring
diff --git a/doc/guides/platform/cnxk.rst b/doc/guides/platform/cnxk.rst
index 3b072877a1..9bbba65f2e 100644
--- a/doc/guides/platform/cnxk.rst
+++ b/doc/guides/platform/cnxk.rst
@@ -141,6 +141,9 @@ HW Offload Drivers
 
 This section lists dataplane H/W block(s) available in CNXK SoC.
 
+#. **Mempool Driver**
+   See :doc:`../mempool/cnxk` for NPA mempool driver information.
+
 Procedure to Setup Platform
 ---
 
diff --git a/drivers/mempool/cnxk/cnxk_mempool.c 
b/drivers/mempool/cnxk/cnxk_mempool.c
new file mode 100644
index 00..947078c052
--- /dev/null
+++ b/drivers/mempool/cnxk/cnxk_mempool.c
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "roc_api.h"
+
+static int
+npa_remove(struct rte_pci_device *pci_dev)
+{
+   RTE_SET_USED(pci_dev);
+
+   return 0;
+}
+
+static int
+npa_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+   RTE_SET_USED(pci_drv);
+   RTE_SET_USED(p

[dpdk-dev] [PATCH v2 04/11] mempool/cnxk: register lf init/fini callbacks

2021-04-03 Thread Ashwin Sekhar T K
Register the CNXk mempool lf init/fini callbacks which
will set the appropriate mempool ops to be used according
to the platform.

Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cnxk_mempool_ops.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c 
b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index 2ce1816c04..18c307288c 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -2,6 +2,7 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include 
 #include 
 
 #include "roc_api.h"
@@ -169,3 +170,23 @@ cnxk_mempool_populate(struct rte_mempool *mp, unsigned int 
max_objs,
mp, RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ, max_objs, vaddr, iova,
len, obj_cb, obj_cb_arg);
 }
+
+static int
+cnxk_mempool_lf_init(void)
+{
+   if (roc_model_is_cn10k() || roc_model_is_cn9k())
+   rte_mbuf_set_platform_mempool_ops("cnxk_mempool_ops");
+
+   return 0;
+}
+
+static void
+cnxk_mempool_lf_fini(void)
+{
+}
+
+RTE_INIT(cnxk_mempool_ops_init)
+{
+   roc_npa_lf_init_cb_register(cnxk_mempool_lf_init);
+   roc_npa_lf_fini_cb_register(cnxk_mempool_lf_fini);
+}
-- 
2.31.0



[dpdk-dev] [PATCH v2 05/11] mempool/cnxk: add cn9k mempool ops

2021-04-03 Thread Ashwin Sekhar T K
Add Marvell CN9k mempool ops and implement CN9k mempool
alloc which makes sure that the element size always occupy
odd number of cachelines to ensure even distribution among
of elements among L1D cache sets.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cn9k_mempool_ops.c | 54 +
 drivers/mempool/cnxk/cnxk_mempool_ops.c |  4 +-
 drivers/mempool/cnxk/meson.build|  3 +-
 3 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 drivers/mempool/cnxk/cn9k_mempool_ops.c

diff --git a/drivers/mempool/cnxk/cn9k_mempool_ops.c 
b/drivers/mempool/cnxk/cn9k_mempool_ops.c
new file mode 100644
index 00..f5ac163af9
--- /dev/null
+++ b/drivers/mempool/cnxk/cn9k_mempool_ops.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include 
+
+#include "roc_api.h"
+#include "cnxk_mempool.h"
+
+static int
+cn9k_mempool_alloc(struct rte_mempool *mp)
+{
+   size_t block_size, padding;
+
+   block_size = mp->elt_size + mp->header_size + mp->trailer_size;
+   /* Align header size to ROC_ALIGN */
+   if (mp->header_size % ROC_ALIGN != 0) {
+   padding = RTE_ALIGN_CEIL(mp->header_size, ROC_ALIGN) -
+ mp->header_size;
+   mp->header_size += padding;
+   block_size += padding;
+   }
+
+   /* Align block size to ROC_ALIGN */
+   if (block_size % ROC_ALIGN != 0) {
+   padding = RTE_ALIGN_CEIL(block_size, ROC_ALIGN) - block_size;
+   mp->trailer_size += padding;
+   block_size += padding;
+   }
+
+   /*
+* Marvell CN9k has 8 sets, 41 ways L1D cache, VA<9:7> bits dictate the
+* set selection. Add additional padding to ensure that the element size
+* always occupies odd number of cachelines to ensure even distribution
+* of elements among L1D cache sets.
+*/
+   padding = ((block_size / ROC_ALIGN) % 2) ? 0 : ROC_ALIGN;
+   mp->trailer_size += padding;
+
+   return cnxk_mempool_alloc(mp);
+}
+
+static struct rte_mempool_ops cn9k_mempool_ops = {
+   .name = "cn9k_mempool_ops",
+   .alloc = cn9k_mempool_alloc,
+   .free = cnxk_mempool_free,
+   .enqueue = cnxk_mempool_enq,
+   .dequeue = cnxk_mempool_deq,
+   .get_count = cnxk_mempool_get_count,
+   .calc_mem_size = cnxk_mempool_calc_mem_size,
+   .populate = cnxk_mempool_populate,
+};
+
+MEMPOOL_REGISTER_OPS(cn9k_mempool_ops);
diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c 
b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index 18c307288c..45c45e9943 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -174,7 +174,9 @@ cnxk_mempool_populate(struct rte_mempool *mp, unsigned int 
max_objs,
 static int
 cnxk_mempool_lf_init(void)
 {
-   if (roc_model_is_cn10k() || roc_model_is_cn9k())
+   if (roc_model_is_cn9k())
+   rte_mbuf_set_platform_mempool_ops("cn9k_mempool_ops");
+   else if (roc_model_is_cn10k())
rte_mbuf_set_platform_mempool_ops("cnxk_mempool_ops");
 
return 0;
diff --git a/drivers/mempool/cnxk/meson.build b/drivers/mempool/cnxk/meson.build
index 52244e728b..ff31893ff4 100644
--- a/drivers/mempool/cnxk/meson.build
+++ b/drivers/mempool/cnxk/meson.build
@@ -9,6 +9,7 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
 endif
 
 sources = files('cnxk_mempool.c',
-   'cnxk_mempool_ops.c')
+   'cnxk_mempool_ops.c',
+   'cn9k_mempool_ops.c')
 
 deps += ['eal', 'mbuf', 'kvargs', 'bus_pci', 'common_cnxk', 'mempool']
-- 
2.31.0



[dpdk-dev] [PATCH v2 06/11] mempool/cnxk: add cn9k optimized mempool enqueue/dequeue

2021-04-03 Thread Ashwin Sekhar T K
Add Marvell CN9k mempool enqueue/dequeue. Marvell CN9k
supports burst dequeue which allows to dequeue up to 32
pointers using pipelined casp instructions.

Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Ashwin Sekhar T K 
---
 doc/guides/mempool/cnxk.rst |  4 +++
 drivers/mempool/cnxk/cn9k_mempool_ops.c | 39 +++--
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/doc/guides/mempool/cnxk.rst b/doc/guides/mempool/cnxk.rst
index 907c19c841..f51532b101 100644
--- a/doc/guides/mempool/cnxk.rst
+++ b/doc/guides/mempool/cnxk.rst
@@ -21,6 +21,10 @@ CNXK NPA PMD supports:
 - Ethdev Rx buffer allocation in HW to save CPU cycles in the Rx path.
 - Ethdev Tx buffer recycling in HW to save CPU cycles in the Tx path.
 
+CN9k NPA supports:
+
+- Burst alloc of up to 32 pointers.
+
 Prerequisites and Compilation procedure
 ---
 
diff --git a/drivers/mempool/cnxk/cn9k_mempool_ops.c 
b/drivers/mempool/cnxk/cn9k_mempool_ops.c
index f5ac163af9..c0cdba640b 100644
--- a/drivers/mempool/cnxk/cn9k_mempool_ops.c
+++ b/drivers/mempool/cnxk/cn9k_mempool_ops.c
@@ -7,6 +7,41 @@
 #include "roc_api.h"
 #include "cnxk_mempool.h"
 
+static int __rte_hot
+cn9k_mempool_enq(struct rte_mempool *mp, void *const *obj_table, unsigned int 
n)
+{
+   /* Ensure mbuf init changes are written before the free pointers
+* are enqueued to the stack.
+*/
+   rte_io_wmb();
+   roc_npa_aura_op_bulk_free(mp->pool_id, (const uint64_t *)obj_table, n,
+ 0);
+
+   return 0;
+}
+
+static inline int __rte_hot
+cn9k_mempool_deq(struct rte_mempool *mp, void **obj_table, unsigned int n)
+{
+   unsigned int count;
+
+   count = roc_npa_aura_op_bulk_alloc(mp->pool_id, (uint64_t *)obj_table,
+  n, 0, 1);
+
+   if (unlikely(count != n)) {
+   /* If bulk alloc failed to allocate all pointers, try
+* allocating remaining pointers with the default alloc
+* with retry scheme.
+*/
+   if (cnxk_mempool_deq(mp, &obj_table[count], n - count)) {
+   cn9k_mempool_enq(mp, obj_table, count);
+   return -ENOENT;
+   }
+   }
+
+   return 0;
+}
+
 static int
 cn9k_mempool_alloc(struct rte_mempool *mp)
 {
@@ -44,8 +79,8 @@ static struct rte_mempool_ops cn9k_mempool_ops = {
.name = "cn9k_mempool_ops",
.alloc = cn9k_mempool_alloc,
.free = cnxk_mempool_free,
-   .enqueue = cnxk_mempool_enq,
-   .dequeue = cnxk_mempool_deq,
+   .enqueue = cn9k_mempool_enq,
+   .dequeue = cn9k_mempool_deq,
.get_count = cnxk_mempool_get_count,
.calc_mem_size = cnxk_mempool_calc_mem_size,
.populate = cnxk_mempool_populate,
-- 
2.31.0



[dpdk-dev] [PATCH v2 07/11] mempool/cnxk: add cn10k mempool ops

2021-04-03 Thread Ashwin Sekhar T K
Add Marvell CN10k mempool ops and implement CN10k mempool alloc.

CN10k has 64 bytes L1D cache line size. Hence the CN10k mempool
alloc does not make the element size an odd multiple L1D cache
line size as NPA requires the element sizes to be multiples of
128 bytes.

Signed-off-by: Ashwin Sekhar T K 
---
 doc/guides/mempool/cnxk.rst  |  4 ++
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 52 
 drivers/mempool/cnxk/cnxk_mempool_ops.c  |  2 +-
 drivers/mempool/cnxk/meson.build |  3 +-
 4 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 drivers/mempool/cnxk/cn10k_mempool_ops.c

diff --git a/doc/guides/mempool/cnxk.rst b/doc/guides/mempool/cnxk.rst
index f51532b101..783368e690 100644
--- a/doc/guides/mempool/cnxk.rst
+++ b/doc/guides/mempool/cnxk.rst
@@ -80,3 +80,7 @@ Standalone mempool device
device. In case, if end user need to run mempool as a standalone device
(without ethdev or eventdev), end user needs to bind a mempool device using
``usertools/dpdk-devbind.py``
+
+   Example command to run ``mempool_autotest`` test with standalone CN10K NPA 
device::
+
+ echo "mempool_autotest" | /app/test/dpdk-test -c 0xf0 
--mbuf-pool-ops-name="cn10k_mempool_ops"
diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
b/drivers/mempool/cnxk/cn10k_mempool_ops.c
new file mode 100644
index 00..9b63789006
--- /dev/null
+++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include 
+
+#include "roc_api.h"
+#include "cnxk_mempool.h"
+
+static int
+cn10k_mempool_alloc(struct rte_mempool *mp)
+{
+   uint32_t block_size;
+   size_t padding;
+
+   block_size = mp->elt_size + mp->header_size + mp->trailer_size;
+   /* Align header size to ROC_ALIGN */
+   if (mp->header_size % ROC_ALIGN != 0) {
+   padding = RTE_ALIGN_CEIL(mp->header_size, ROC_ALIGN) -
+ mp->header_size;
+   mp->header_size += padding;
+   block_size += padding;
+   }
+
+   /* Align block size to ROC_ALIGN */
+   if (block_size % ROC_ALIGN != 0) {
+   padding = RTE_ALIGN_CEIL(block_size, ROC_ALIGN) - block_size;
+   mp->trailer_size += padding;
+   block_size += padding;
+   }
+
+   return cnxk_mempool_alloc(mp);
+}
+
+static void
+cn10k_mempool_free(struct rte_mempool *mp)
+{
+   cnxk_mempool_free(mp);
+}
+
+static struct rte_mempool_ops cn10k_mempool_ops = {
+   .name = "cn10k_mempool_ops",
+   .alloc = cn10k_mempool_alloc,
+   .free = cn10k_mempool_free,
+   .enqueue = cnxk_mempool_enq,
+   .dequeue = cnxk_mempool_deq,
+   .get_count = cnxk_mempool_get_count,
+   .calc_mem_size = cnxk_mempool_calc_mem_size,
+   .populate = cnxk_mempool_populate,
+};
+
+MEMPOOL_REGISTER_OPS(cn10k_mempool_ops);
diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c 
b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index 45c45e9943..0ec131a475 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -177,7 +177,7 @@ cnxk_mempool_lf_init(void)
if (roc_model_is_cn9k())
rte_mbuf_set_platform_mempool_ops("cn9k_mempool_ops");
else if (roc_model_is_cn10k())
-   rte_mbuf_set_platform_mempool_ops("cnxk_mempool_ops");
+   rte_mbuf_set_platform_mempool_ops("cn10k_mempool_ops");
 
return 0;
 }
diff --git a/drivers/mempool/cnxk/meson.build b/drivers/mempool/cnxk/meson.build
index ff31893ff4..3282b5e5a6 100644
--- a/drivers/mempool/cnxk/meson.build
+++ b/drivers/mempool/cnxk/meson.build
@@ -10,6 +10,7 @@ endif
 
 sources = files('cnxk_mempool.c',
'cnxk_mempool_ops.c',
-   'cn9k_mempool_ops.c')
+   'cn9k_mempool_ops.c',
+   'cn10k_mempool_ops.c')
 
 deps += ['eal', 'mbuf', 'kvargs', 'bus_pci', 'common_cnxk', 'mempool']
-- 
2.31.0



[dpdk-dev] [PATCH v2 08/11] mempool/cnxk: add batch op init

2021-04-03 Thread Ashwin Sekhar T K
Marvell CN10k mempool supports batch enqueue/dequeue which can
dequeue up to 512 pointers and enqueue up to 15 pointers using
a single instruction.

These batch operations require a DMA memory to enqueue/dequeue
pointers. This patch adds the initialization of this DMA memory.

Signed-off-by: Ashwin Sekhar T K 
---
 doc/guides/mempool/cnxk.rst  |   5 +
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 122 ++-
 drivers/mempool/cnxk/cnxk_mempool.h  |   3 +
 drivers/mempool/cnxk/cnxk_mempool_ops.c  |  13 ++-
 4 files changed, 138 insertions(+), 5 deletions(-)

diff --git a/doc/guides/mempool/cnxk.rst b/doc/guides/mempool/cnxk.rst
index 783368e690..286ee29003 100644
--- a/doc/guides/mempool/cnxk.rst
+++ b/doc/guides/mempool/cnxk.rst
@@ -25,6 +25,11 @@ CN9k NPA supports:
 
 - Burst alloc of up to 32 pointers.
 
+CN10k NPA supports:
+
+- Batch dequeue of up to 512 pointers with single instruction.
+- Batch enqueue of up to 15 pointers with single instruction.
+
 Prerequisites and Compilation procedure
 ---
 
diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
b/drivers/mempool/cnxk/cn10k_mempool_ops.c
index 9b63789006..d34041528a 100644
--- a/drivers/mempool/cnxk/cn10k_mempool_ops.c
+++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
@@ -7,11 +7,117 @@
 #include "roc_api.h"
 #include "cnxk_mempool.h"
 
+#define BATCH_ALLOC_SZ ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS
+
+enum batch_op_status {
+   BATCH_ALLOC_OP_NOT_ISSUED = 0,
+   BATCH_ALLOC_OP_ISSUED = 1,
+   BATCH_ALLOC_OP_DONE
+};
+
+struct batch_op_mem {
+   unsigned int sz;
+   enum batch_op_status status;
+   uint64_t objs[BATCH_ALLOC_SZ] __rte_aligned(ROC_ALIGN);
+};
+
+struct batch_op_data {
+   uint64_t lmt_addr;
+   struct batch_op_mem mem[RTE_MAX_LCORE] __rte_aligned(ROC_ALIGN);
+};
+
+static struct batch_op_data **batch_op_data;
+
+#define BATCH_OP_DATA_GET(pool_id) 
\
+   batch_op_data[roc_npa_aura_handle_to_aura(pool_id)]
+
+#define BATCH_OP_DATA_SET(pool_id, op_data)
\
+   do {   \
+   uint64_t aura = roc_npa_aura_handle_to_aura(pool_id);  \
+   batch_op_data[aura] = op_data; \
+   } while (0)
+
+int
+cn10k_mempool_lf_init(void)
+{
+   unsigned int maxpools, sz;
+
+   maxpools = roc_idev_npa_maxpools_get();
+   sz = maxpools * sizeof(struct batch_op_data *);
+
+   batch_op_data = rte_zmalloc(NULL, sz, ROC_ALIGN);
+   if (!batch_op_data)
+   return -1;
+
+   return 0;
+}
+
+void
+cn10k_mempool_lf_fini(void)
+{
+   if (!batch_op_data)
+   return;
+
+   rte_free(batch_op_data);
+   batch_op_data = NULL;
+}
+
+static int
+batch_op_init(struct rte_mempool *mp)
+{
+   struct batch_op_data *op_data;
+   int i;
+
+   RTE_ASSERT(BATCH_OP_DATA_GET(mp->pool_id) == NULL);
+   op_data = rte_zmalloc(NULL, sizeof(struct batch_op_data), ROC_ALIGN);
+   if (op_data == NULL)
+   return -1;
+
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   op_data->mem[i].sz = 0;
+   op_data->mem[i].status = BATCH_ALLOC_OP_NOT_ISSUED;
+   }
+
+   op_data->lmt_addr = roc_idev_lmt_base_addr_get();
+   BATCH_OP_DATA_SET(mp->pool_id, op_data);
+
+   return 0;
+}
+
+static void
+batch_op_fini(struct rte_mempool *mp)
+{
+   struct batch_op_data *op_data;
+   int i;
+
+   op_data = BATCH_OP_DATA_GET(mp->pool_id);
+
+   rte_wmb();
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct batch_op_mem *mem = &op_data->mem[i];
+
+   if (mem->status == BATCH_ALLOC_OP_ISSUED) {
+   mem->sz = roc_npa_aura_batch_alloc_extract(
+   mem->objs, mem->objs, BATCH_ALLOC_SZ);
+   mem->status = BATCH_ALLOC_OP_DONE;
+   }
+   if (mem->status == BATCH_ALLOC_OP_DONE) {
+   roc_npa_aura_op_bulk_free(mp->pool_id, mem->objs,
+ mem->sz, 1);
+   mem->status = BATCH_ALLOC_OP_NOT_ISSUED;
+   }
+   }
+
+   rte_free(op_data);
+   BATCH_OP_DATA_SET(mp->pool_id, NULL);
+}
+
 static int
 cn10k_mempool_alloc(struct rte_mempool *mp)
 {
uint32_t block_size;
size_t padding;
+   int rc;
 
block_size = mp->elt_size + mp->header_size + mp->trailer_size;
/* Align header size to ROC_ALIGN */
@@ -29,12 +135,26 @@ cn10k_mempool_alloc(struct rte_mempool *mp)
block_size += padding;
}
 
-   return cnxk_mempool_alloc(mp);
+   rc = cnxk_mempool_alloc(mp);
+   if (rc)
+   return rc;
+
+   rc = batch_op_init(mp);
+   if (rc) {
+   plt_err("Failed to init batch a

[dpdk-dev] [PATCH v2 09/11] mempool/cnxk: add cn10k batch enqueue op

2021-04-03 Thread Ashwin Sekhar T K
Add the implementation for Marvell CN10k mempool batch enqueue op.

Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
b/drivers/mempool/cnxk/cn10k_mempool_ops.c
index d34041528a..2e3ec414da 100644
--- a/drivers/mempool/cnxk/cn10k_mempool_ops.c
+++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
@@ -112,6 +112,32 @@ batch_op_fini(struct rte_mempool *mp)
BATCH_OP_DATA_SET(mp->pool_id, NULL);
 }
 
+static int __rte_hot
+cn10k_mempool_enq(struct rte_mempool *mp, void *const *obj_table,
+ unsigned int n)
+{
+   const uint64_t *ptr = (const uint64_t *)obj_table;
+   uint64_t lmt_addr = 0, lmt_id = 0;
+   struct batch_op_data *op_data;
+
+   /* Ensure mbuf init changes are written before the free pointers are
+* enqueued to the stack.
+*/
+   rte_io_wmb();
+
+   if (n == 1) {
+   roc_npa_aura_op_free(mp->pool_id, 1, ptr[0]);
+   return 0;
+   }
+
+   op_data = BATCH_OP_DATA_GET(mp->pool_id);
+   lmt_addr = op_data->lmt_addr;
+   ROC_LMT_BASE_ID_GET(lmt_addr, lmt_id);
+   roc_npa_aura_op_batch_free(mp->pool_id, ptr, n, 1, lmt_addr, lmt_id);
+
+   return 0;
+}
+
 static int
 cn10k_mempool_alloc(struct rte_mempool *mp)
 {
@@ -162,7 +188,7 @@ static struct rte_mempool_ops cn10k_mempool_ops = {
.name = "cn10k_mempool_ops",
.alloc = cn10k_mempool_alloc,
.free = cn10k_mempool_free,
-   .enqueue = cnxk_mempool_enq,
+   .enqueue = cn10k_mempool_enq,
.dequeue = cnxk_mempool_deq,
.get_count = cnxk_mempool_get_count,
.calc_mem_size = cnxk_mempool_calc_mem_size,
-- 
2.31.0



[dpdk-dev] [PATCH v2 10/11] mempool/cnxk: add cn10k get count op

2021-04-03 Thread Ashwin Sekhar T K
Add the implementation for Marvell CN10k get count op.

Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
b/drivers/mempool/cnxk/cn10k_mempool_ops.c
index 2e3ec414da..16b2f6697f 100644
--- a/drivers/mempool/cnxk/cn10k_mempool_ops.c
+++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
@@ -138,6 +138,32 @@ cn10k_mempool_enq(struct rte_mempool *mp, void *const 
*obj_table,
return 0;
 }
 
+static unsigned int
+cn10k_mempool_get_count(const struct rte_mempool *mp)
+{
+   struct batch_op_data *op_data;
+   unsigned int count = 0;
+   int i;
+
+   op_data = BATCH_OP_DATA_GET(mp->pool_id);
+
+   rte_wmb();
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct batch_op_mem *mem = &op_data->mem[i];
+
+   if (mem->status == BATCH_ALLOC_OP_ISSUED)
+   count += roc_npa_aura_batch_alloc_count(mem->objs,
+   BATCH_ALLOC_SZ);
+
+   if (mem->status == BATCH_ALLOC_OP_DONE)
+   count += mem->sz;
+   }
+
+   count += cnxk_mempool_get_count(mp);
+
+   return count;
+}
+
 static int
 cn10k_mempool_alloc(struct rte_mempool *mp)
 {
@@ -190,7 +216,7 @@ static struct rte_mempool_ops cn10k_mempool_ops = {
.free = cn10k_mempool_free,
.enqueue = cn10k_mempool_enq,
.dequeue = cnxk_mempool_deq,
-   .get_count = cnxk_mempool_get_count,
+   .get_count = cn10k_mempool_get_count,
.calc_mem_size = cnxk_mempool_calc_mem_size,
.populate = cnxk_mempool_populate,
 };
-- 
2.31.0



[dpdk-dev] [PATCH v2 11/11] mempool/cnxk: add cn10k batch dequeue op

2021-04-03 Thread Ashwin Sekhar T K
Add the implementation for Marvell CN10k mempool batch dequeue op.

Signed-off-by: Ashwin Sekhar T K 
---
 drivers/mempool/cnxk/cn10k_mempool_ops.c | 72 +++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
b/drivers/mempool/cnxk/cn10k_mempool_ops.c
index 16b2f6697f..05f36ff263 100644
--- a/drivers/mempool/cnxk/cn10k_mempool_ops.c
+++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
@@ -164,6 +164,76 @@ cn10k_mempool_get_count(const struct rte_mempool *mp)
return count;
 }
 
+static int __rte_hot
+cn10k_mempool_deq(struct rte_mempool *mp, void **obj_table, unsigned int n)
+{
+   struct batch_op_data *op_data;
+   struct batch_op_mem *mem;
+   unsigned int count = 0;
+   int tid, rc, retry;
+   bool loop = true;
+
+   op_data = BATCH_OP_DATA_GET(mp->pool_id);
+   tid = rte_lcore_id();
+   mem = &op_data->mem[tid];
+
+   /* Issue batch alloc */
+   if (mem->status == BATCH_ALLOC_OP_NOT_ISSUED) {
+   rc = roc_npa_aura_batch_alloc_issue(mp->pool_id, mem->objs,
+   BATCH_ALLOC_SZ, 0, 1);
+   /* If issue fails, try falling back to default alloc */
+   if (unlikely(rc))
+   return cn10k_mempool_enq(mp, obj_table, n);
+   mem->status = BATCH_ALLOC_OP_ISSUED;
+   }
+
+   retry = 4;
+   while (loop) {
+   unsigned int cur_sz;
+
+   if (mem->status == BATCH_ALLOC_OP_ISSUED) {
+   mem->sz = roc_npa_aura_batch_alloc_extract(
+   mem->objs, mem->objs, BATCH_ALLOC_SZ);
+
+   /* If partial alloc reduce the retry count */
+   retry -= (mem->sz != BATCH_ALLOC_SZ);
+   /* Break the loop if retry count exhausted */
+   loop = !!retry;
+   mem->status = BATCH_ALLOC_OP_DONE;
+   }
+
+   cur_sz = n - count;
+   if (cur_sz > mem->sz)
+   cur_sz = mem->sz;
+
+   /* Dequeue the pointers */
+   memcpy(&obj_table[count], &mem->objs[mem->sz - cur_sz],
+  cur_sz * sizeof(uintptr_t));
+   mem->sz -= cur_sz;
+   count += cur_sz;
+
+   /* Break loop if the required pointers has been dequeued */
+   loop &= (count != n);
+
+   /* Issue next batch alloc if pointers are exhausted */
+   if (mem->sz == 0) {
+   rc = roc_npa_aura_batch_alloc_issue(
+   mp->pool_id, mem->objs, BATCH_ALLOC_SZ, 0, 1);
+   /* Break loop if issue failed and set status */
+   loop &= !rc;
+   mem->status = !rc;
+   }
+   }
+
+   if (unlikely(count != n)) {
+   /* No partial alloc allowed. Free up allocated pointers */
+   cn10k_mempool_enq(mp, obj_table, count);
+   return -ENOENT;
+   }
+
+   return 0;
+}
+
 static int
 cn10k_mempool_alloc(struct rte_mempool *mp)
 {
@@ -215,7 +285,7 @@ static struct rte_mempool_ops cn10k_mempool_ops = {
.alloc = cn10k_mempool_alloc,
.free = cn10k_mempool_free,
.enqueue = cn10k_mempool_enq,
-   .dequeue = cnxk_mempool_deq,
+   .dequeue = cn10k_mempool_deq,
.get_count = cn10k_mempool_get_count,
.calc_mem_size = cnxk_mempool_calc_mem_size,
.populate = cnxk_mempool_populate,
-- 
2.31.0



Re: [dpdk-dev] [PATCH v2 09/11] mempool/cnxk: add cn10k batch enqueue op

2021-04-03 Thread Jerin Jacob
On Sat, Apr 3, 2021 at 7:49 PM Ashwin Sekhar T K  wrote:
>
> Add the implementation for Marvell CN10k mempool batch enqueue op.
>
> Signed-off-by: Ashwin Sekhar T K 
> ---
>  drivers/mempool/cnxk/cn10k_mempool_ops.c | 28 +++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mempool/cnxk/cn10k_mempool_ops.c 
> b/drivers/mempool/cnxk/cn10k_mempool_ops.c
> index d34041528a..2e3ec414da 100644
> --- a/drivers/mempool/cnxk/cn10k_mempool_ops.c
> +++ b/drivers/mempool/cnxk/cn10k_mempool_ops.c
> @@ -112,6 +112,32 @@ batch_op_fini(struct rte_mempool *mp)
> BATCH_OP_DATA_SET(mp->pool_id, NULL);
>  }
>
> +static int __rte_hot
> +cn10k_mempool_enq(struct rte_mempool *mp, void *const *obj_table,
> + unsigned int n)
> +{
> +   const uint64_t *ptr = (const uint64_t *)obj_table;
> +   uint64_t lmt_addr = 0, lmt_id = 0;

Please check the initialization to zero is required or not.

> +   struct batch_op_data *op_data;
> +
> +   /* Ensure mbuf init changes are written before the free pointers are
> +* enqueued to the stack.
> +*/
> +   rte_io_wmb();
> +
> +   if (n == 1) {
> +   roc_npa_aura_op_free(mp->pool_id, 1, ptr[0]);
> +   return 0;
> +   }
> +
> +   op_data = BATCH_OP_DATA_GET(mp->pool_id);
> +   lmt_addr = op_data->lmt_addr;
> +   ROC_LMT_BASE_ID_GET(lmt_addr, lmt_id);
> +   roc_npa_aura_op_batch_free(mp->pool_id, ptr, n, 1, lmt_addr, lmt_id);
> +
> +   return 0;
> +}
> +
>  static int
>  cn10k_mempool_alloc(struct rte_mempool *mp)
>  {
> @@ -162,7 +188,7 @@ static struct rte_mempool_ops cn10k_mempool_ops = {
> .name = "cn10k_mempool_ops",
> .alloc = cn10k_mempool_alloc,
> .free = cn10k_mempool_free,
> -   .enqueue = cnxk_mempool_enq,
> +   .enqueue = cn10k_mempool_enq,
> .dequeue = cnxk_mempool_deq,
> .get_count = cnxk_mempool_get_count,
> .calc_mem_size = cnxk_mempool_calc_mem_size,
> --
> 2.31.0
>


Re: [dpdk-dev] [PATCH v2 08/11] mempool/cnxk: add batch op init

2021-04-03 Thread Jerin Jacob
On Sat, Apr 3, 2021 at 7:49 PM Ashwin Sekhar T K  wrote:
>
> Marvell CN10k mempool supports batch enqueue/dequeue which can
> dequeue up to 512 pointers and enqueue up to 15 pointers using
> a single instruction.
>
> These batch operations require a DMA memory to enqueue/dequeue
> pointers. This patch adds the initialization of this DMA memory.
>
> Signed-off-by: Ashwin Sekhar T K 
> ---
>  doc/guides/mempool/cnxk.rst  |   5 +
>  drivers/mempool/cnxk/cn10k_mempool_ops.c | 122 ++-
>  drivers/mempool/cnxk/cnxk_mempool.h  |   3 +
>  drivers/mempool/cnxk/cnxk_mempool_ops.c  |  13 ++-
>  4 files changed, 138 insertions(+), 5 deletions(-)
>
> +
> +static struct batch_op_data **batch_op_data;

Please remove the global variable as it will break the multi-process.

> +
> +#define BATCH_OP_DATA_GET(pool_id)   
>   \
> +   batch_op_data[roc_npa_aura_handle_to_aura(pool_id)]
> +
> +#define BATCH_OP_DATA_SET(pool_id, op_data)  
>   \
> +   do {  
>  \
> +   uint64_t aura = roc_npa_aura_handle_to_aura(pool_id); 
>  \
> +   batch_op_data[aura] = op_data;
>  \
> +   } while (0)
> +

Please check this can be made as static inline if there is NO performance cost.


[dpdk-dev] [PATCH v7 0/5] eal/windows: do not expose POSIX symbols

2021-04-03 Thread Dmitry Kozlyuk
On Windows, EAL contains two sets of functions and macros for POSIX
compatibility:  and a networking shim (socket headers).
The latter conflicts with system headers and should not exist.
Exposing the former from EAL can break consumer own POSIX compatibility
layer and is against standards in general. Hide these symbols from
external consumers, while keeping them available for DPDK code.

v7:
* Rearrange patches, improve wording, fix typo.
* rte_os_internal.h -> rte_os_shim.h for possible later exposure
* Remove unnecessary blank lines.

Dmitry Kozlyuk (5):
  eal: add sleep API
  eal/windows: hide asprintf() shim
  eal: make OS shims internal
  net: work around s_addr macro on Windows
  net: provide IP-related API on any OS

 drivers/bus/pci/private.h|  4 +-
 drivers/bus/vdev/vdev_private.h  |  2 +
 drivers/common/mlx5/mlx5_common.h|  1 +
 drivers/net/i40e/i40e_ethdev.c   |  1 +
 drivers/net/i40e/i40e_fdir.c |  1 +
 drivers/net/mlx5/mlx5.h  |  1 -
 drivers/net/mlx5/mlx5_flow.c |  4 +-
 drivers/net/mlx5/mlx5_flow.h |  3 +-
 drivers/net/mlx5/mlx5_mac.c  |  1 -
 examples/cmdline/commands.c  |  5 --
 examples/cmdline/parse_obj_list.c|  2 -
 lib/librte_cmdline/cmdline.c |  5 --
 lib/librte_cmdline/cmdline_os_windows.c  |  2 -
 lib/librte_cmdline/cmdline_parse.c   |  2 -
 lib/librte_cmdline/cmdline_parse_etheraddr.c |  6 --
 lib/librte_cmdline/cmdline_parse_ipaddr.c|  6 --
 lib/librte_cmdline/cmdline_parse_ipaddr.h|  2 +-
 lib/librte_cmdline/cmdline_private.h |  1 +
 lib/librte_cmdline/cmdline_socket.c  |  4 -
 lib/librte_eal/common/eal_common_config.c|  1 -
 lib/librte_eal/common/eal_common_errno.c |  4 +
 lib/librte_eal/common/eal_common_options.c   |  2 +-
 lib/librte_eal/common/eal_common_timer.c |  5 +-
 lib/librte_eal/common/eal_internal_cfg.h |  1 +
 lib/librte_eal/common/eal_private.h  | 11 +++
 lib/librte_eal/freebsd/include/rte_os_shim.h | 14 +++
 lib/librte_eal/include/rte_thread.h  | 11 +++
 lib/librte_eal/linux/include/rte_os_shim.h   | 14 +++
 lib/librte_eal/rte_eal_exports.def   |  2 +
 lib/librte_eal/unix/rte_thread.c | 10 ++-
 lib/librte_eal/version.map   |  1 +
 lib/librte_eal/windows/eal.c | 30 +++
 lib/librte_eal/windows/eal_hugepages.c   |  1 -
 lib/librte_eal/windows/eal_lcore.c   |  1 -
 lib/librte_eal/windows/eal_memalloc.c|  1 -
 lib/librte_eal/windows/eal_thread.c  |  9 +-
 lib/librte_eal/windows/include/arpa/inet.h   | 30 ---
 lib/librte_eal/windows/include/netinet/in.h  | 38 
 lib/librte_eal/windows/include/netinet/ip.h  | 10 ---
 lib/librte_eal/windows/include/rte_os.h  | 92 +---
 lib/librte_eal/windows/include/rte_os_shim.h | 36 
 lib/librte_eal/windows/include/sys/socket.h  | 24 -
 lib/librte_ethdev/ethdev_private.h   |  2 +
 lib/librte_ethdev/rte_ethdev.c   | 12 +--
 lib/librte_ethdev/rte_ethdev_core.h  |  1 -
 lib/librte_kvargs/rte_kvargs.c   |  1 +
 lib/librte_net/rte_ether.h   | 26 --
 lib/librte_net/rte_ip.h  |  7 ++
 lib/librte_net/rte_net.c |  1 +
 49 files changed, 196 insertions(+), 255 deletions(-)
 create mode 100644 lib/librte_eal/freebsd/include/rte_os_shim.h
 create mode 100644 lib/librte_eal/linux/include/rte_os_shim.h
 delete mode 100644 lib/librte_eal/windows/include/arpa/inet.h
 delete mode 100644 lib/librte_eal/windows/include/netinet/in.h
 delete mode 100644 lib/librte_eal/windows/include/netinet/ip.h
 create mode 100644 lib/librte_eal/windows/include/rte_os_shim.h
 delete mode 100644 lib/librte_eal/windows/include/sys/socket.h

-- 
2.29.3



[dpdk-dev] [PATCH v7 1/5] eal: add sleep API

2021-04-03 Thread Dmitry Kozlyuk
POSIX sleep(3) is missing from Windows.
Add generic rte_thread_sleep() to suspend current OS thread.

Signed-off-by: Dmitry Kozlyuk 
Acked-by: Khoa To 
Acked-by: Ray Kinsella 
---
 lib/librte_eal/common/eal_common_timer.c |  5 +++--
 lib/librte_eal/include/rte_thread.h  | 11 +++
 lib/librte_eal/rte_eal_exports.def   |  2 ++
 lib/librte_eal/unix/rte_thread.c | 10 +-
 lib/librte_eal/version.map   |  1 +
 lib/librte_eal/windows/eal_thread.c  |  9 -
 6 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_timer.c 
b/lib/librte_eal/common/eal_common_timer.c
index 71e0bd035a..0e89a4f7df 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 #include "eal_memcfg.h"
@@ -47,9 +48,9 @@ estimate_tsc_freq(void)
 #define CYC_PER_10MHZ 1E7
RTE_LOG(WARNING, EAL, "WARNING: TSC frequency estimated roughly"
" - clock timings may be less accurate.\n");
-   /* assume that the sleep(1) will sleep for 1 second */
+   /* assume that the rte_thread_sleep(1) will sleep for 1 second */
uint64_t start = rte_rdtsc();
-   sleep(1);
+   rte_thread_sleep(1);
/* Round up to 10Mhz. 1E7 ~ 10Mhz */
return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ);
 }
diff --git a/lib/librte_eal/include/rte_thread.h 
b/lib/librte_eal/include/rte_thread.h
index 8be8ed8f36..450d72a2fc 100644
--- a/lib/librte_eal/include/rte_thread.h
+++ b/lib/librte_eal/include/rte_thread.h
@@ -119,6 +119,17 @@ int rte_thread_value_set(rte_thread_key key, const void 
*value);
 __rte_experimental
 void *rte_thread_value_get(rte_thread_key key);
 
+/**
+ * Suspend current OS thread for the specified time, yielding CPU to scheduler.
+ *
+ * @param sec
+ *  Number of seconds to sleep. The system may return control later,
+ *  but not earlier. Zero value always yields the CPU, but control may be
+ *  returned immediately.
+ */
+__rte_experimental
+void rte_thread_sleep(unsigned int sec);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/rte_eal_exports.def 
b/lib/librte_eal/rte_eal_exports.def
index c320077547..300c86eb3f 100644
--- a/lib/librte_eal/rte_eal_exports.def
+++ b/lib/librte_eal/rte_eal_exports.def
@@ -334,3 +334,5 @@ EXPORTS
rte_mem_map
rte_mem_page_size
rte_mem_unmap
+
+   rte_thread_sleep
diff --git a/lib/librte_eal/unix/rte_thread.c b/lib/librte_eal/unix/rte_thread.c
index c72d619ec1..a7130b5870 100644
--- a/lib/librte_eal/unix/rte_thread.c
+++ b/lib/librte_eal/unix/rte_thread.c
@@ -3,10 +3,12 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
@@ -90,3 +92,9 @@ rte_thread_value_get(rte_thread_key key)
}
return pthread_getspecific(key->thread_index);
 }
+
+void
+rte_thread_sleep(unsigned int sec)
+{
+   sleep(sec);
+}
diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map
index e23745ae6e..cd35b67a9a 100644
--- a/lib/librte_eal/version.map
+++ b/lib/librte_eal/version.map
@@ -413,6 +413,7 @@ EXPERIMENTAL {
# added in 21.05
rte_thread_key_create;
rte_thread_key_delete;
+   rte_thread_sleep;
rte_thread_value_get;
rte_thread_value_set;
rte_version_minor;
diff --git a/lib/librte_eal/windows/eal_thread.c 
b/lib/librte_eal/windows/eal_thread.c
index 9c3f6d69fd..c84e67009c 100644
--- a/lib/librte_eal/windows/eal_thread.c
+++ b/lib/librte_eal/windows/eal_thread.c
@@ -11,9 +11,10 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "eal_private.h"
+#include "eal_thread.h"
 #include "eal_windows.h"
 
 /*
@@ -154,3 +155,9 @@ rte_thread_setname(__rte_unused pthread_t id, __rte_unused 
const char *name)
/* This is a stub, not the expected result */
return 0;
 }
+
+void
+rte_thread_sleep(unsigned int sec)
+{
+   return Sleep(MS_PER_S * sec);
+}
-- 
2.29.3



[dpdk-dev] [PATCH v7 2/5] eal/windows: hide asprintf() shim

2021-04-03 Thread Dmitry Kozlyuk
Make asprintf(3) implementation for Windows private to EAL, so that it's
hidden from external consumers. It is not exposed to internal consumers
either, because they don't need asprintf() and also because callers from
other modules would have no reliable way to free allocated memory.

Signed-off-by: Dmitry Kozlyuk 
Acked-by: Khoa To 
---
 lib/librte_eal/common/eal_private.h | 11 +
 lib/librte_eal/windows/eal.c| 30 +
 lib/librte_eal/windows/include/rte_os.h | 30 -
 3 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index b8a0d20021..31eda4d2da 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -745,4 +745,15 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t 
*cpuset);
  */
 void __rte_thread_uninit(void);
 
+/**
+ * asprintf(3) replacement for Windows.
+ */
+#ifdef RTE_EXEC_ENV_WINDOWS
+__rte_format_printf(2, 3)
+int eal_asprintf(char **buffer, const char *format, ...);
+
+#define asprintf(buffer, format, ...) \
+   eal_asprintf(buffer, format, ##__VA_ARGS__)
+#endif
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 2fc3d6141c..162671f9ce 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -2,6 +2,8 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
+#include 
+
 #include 
 #include 
 #include 
@@ -411,6 +413,34 @@ rte_eal_init(int argc, char **argv)
return fctret;
 }
 
+/* Don't use MinGW asprintf() to have identical code with all toolchains. */
+int
+eal_asprintf(char **buffer, const char *format, ...)
+{
+   int size, ret;
+   va_list arg;
+
+   va_start(arg, format);
+   size = vsnprintf(NULL, 0, format, arg);
+   va_end(arg);
+   if (size < 0)
+   return -1;
+   size++;
+
+   *buffer = malloc(size);
+   if (*buffer == NULL)
+   return -1;
+
+   va_start(arg, format);
+   ret = vsnprintf(*buffer, size, format, arg);
+   va_end(arg);
+   if (ret != size - 1) {
+   free(*buffer);
+   return -1;
+   }
+   return ret;
+}
+
 int
 rte_vfio_container_dma_map(__rte_unused int container_fd,
__rte_unused uint64_t vaddr,
diff --git a/lib/librte_eal/windows/include/rte_os.h 
b/lib/librte_eal/windows/include/rte_os.h
index f0512f20a6..1afe49f35e 100644
--- a/lib/librte_eal/windows/include/rte_os.h
+++ b/lib/librte_eal/windows/include/rte_os.h
@@ -10,7 +10,6 @@
  * which is not supported natively or named differently in Windows.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -71,34 +70,6 @@ extern "C" {
 typedef long long ssize_t;
 
 #ifndef RTE_TOOLCHAIN_GCC
-
-static inline int
-asprintf(char **buffer, const char *format, ...)
-{
-   int size, ret;
-   va_list arg;
-
-   va_start(arg, format);
-   size = vsnprintf(NULL, 0, format, arg);
-   va_end(arg);
-   if (size < 0)
-   return -1;
-   size++;
-
-   *buffer = (char *)malloc(size);
-   if (*buffer == NULL)
-   return -1;
-
-   va_start(arg, format);
-   ret = vsnprintf(*buffer, size, format, arg);
-   va_end(arg);
-   if (ret != size - 1) {
-   free(*buffer);
-   return -1;
-   }
-   return ret;
-}
-
 static inline const char *
 eal_strerror(int code)
 {
@@ -111,7 +82,6 @@ eal_strerror(int code)
 #ifndef strerror
 #define strerror eal_strerror
 #endif
-
 #endif /* RTE_TOOLCHAIN_GCC */
 
 #ifdef __cplusplus
-- 
2.29.3



[dpdk-dev] [PATCH v7 3/5] eal: make OS shims internal

2021-04-03 Thread Dmitry Kozlyuk
DPDK code often relies on functions and macros that are not standard C,
but are found on all platforms, even if by slightly different names.
Windows  provided macros or inline definitions for such symbols.
However, when placed in public header, these symbols were unnecessarily
exposed, breaking consumer POSIX compatibility code.

Move all shims to , a header to be used instead of
 by internal code. Include it in libraries and PMDs that
previously imported shims from .

Signed-off-by: Dmitry Kozlyuk 
---
 drivers/bus/pci/private.h|  4 +-
 drivers/bus/vdev/vdev_private.h  |  2 +
 drivers/common/mlx5/mlx5_common.h|  1 +
 drivers/net/i40e/i40e_ethdev.c   |  1 +
 lib/librte_cmdline/cmdline.c |  4 --
 lib/librte_cmdline/cmdline_os_windows.c  |  2 -
 lib/librte_cmdline/cmdline_private.h |  1 +
 lib/librte_cmdline/cmdline_socket.c  |  4 --
 lib/librte_eal/common/eal_common_config.c|  1 -
 lib/librte_eal/common/eal_common_errno.c |  4 ++
 lib/librte_eal/common/eal_common_options.c   |  2 +-
 lib/librte_eal/common/eal_internal_cfg.h |  1 +
 lib/librte_eal/freebsd/include/rte_os_shim.h | 14 +
 lib/librte_eal/linux/include/rte_os_shim.h   | 14 +
 lib/librte_eal/windows/eal_hugepages.c   |  1 -
 lib/librte_eal/windows/eal_lcore.c   |  1 -
 lib/librte_eal/windows/eal_memalloc.c|  1 -
 lib/librte_eal/windows/include/rte_os.h  | 62 ++--
 lib/librte_eal/windows/include/rte_os_shim.h | 28 +
 lib/librte_ethdev/ethdev_private.h   |  2 +
 lib/librte_kvargs/rte_kvargs.c   |  1 +
 21 files changed, 77 insertions(+), 74 deletions(-)
 create mode 100644 lib/librte_eal/freebsd/include/rte_os_shim.h
 create mode 100644 lib/librte_eal/linux/include/rte_os_shim.h
 create mode 100644 lib/librte_eal/windows/include/rte_os_shim.h

diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index f566943f5e..4cd9d14ec7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -7,8 +7,10 @@
 
 #include 
 #include 
-#include 
+
 #include 
+#include 
+#include 
 
 extern struct rte_pci_bus rte_pci_bus;
 
diff --git a/drivers/bus/vdev/vdev_private.h b/drivers/bus/vdev/vdev_private.h
index ba6dc48ff3..e683f5f133 100644
--- a/drivers/bus/vdev/vdev_private.h
+++ b/drivers/bus/vdev/vdev_private.h
@@ -5,6 +5,8 @@
 #ifndef _VDEV_PRIVATE_H_
 #define _VDEV_PRIVATE_H_
 
+#include 
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 8eda6749b4..211e330178 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mlx5_prm.h"
 #include "mlx5_devx_cmds.h"
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index fcf150e127..cf9e996ca5 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i40e_logs.h"
 #include "base/i40e_prototype.h"
diff --git a/lib/librte_cmdline/cmdline.c b/lib/librte_cmdline/cmdline.c
index 79ea5f98c8..49770869bb 100644
--- a/lib/librte_cmdline/cmdline.c
+++ b/lib/librte_cmdline/cmdline.c
@@ -18,10 +18,6 @@
 
 #include "cmdline_private.h"
 
-#ifdef RTE_EXEC_ENV_WINDOWS
-#define write _write
-#endif
-
 static void
 cmdline_valid_buffer(struct rdline *rdl, const char *buf,
 __rte_unused unsigned int size)
diff --git a/lib/librte_cmdline/cmdline_os_windows.c 
b/lib/librte_cmdline/cmdline_os_windows.c
index e9585c9eea..73ed9ba290 100644
--- a/lib/librte_cmdline/cmdline_os_windows.c
+++ b/lib/librte_cmdline/cmdline_os_windows.c
@@ -4,8 +4,6 @@
 
 #include 
 
-#include 
-
 #include "cmdline_private.h"
 
 /* Missing from some MinGW-w64 distributions. */
diff --git a/lib/librte_cmdline/cmdline_private.h 
b/lib/librte_cmdline/cmdline_private.h
index a8a6ee9e69..a87c45275c 100644
--- a/lib/librte_cmdline/cmdline_private.h
+++ b/lib/librte_cmdline/cmdline_private.h
@@ -8,6 +8,7 @@
 #include 
 
 #include 
+#include 
 #ifdef RTE_EXEC_ENV_WINDOWS
 #include 
 #endif
diff --git a/lib/librte_cmdline/cmdline_socket.c 
b/lib/librte_cmdline/cmdline_socket.c
index 0fe1497008..998e8ade25 100644
--- a/lib/librte_cmdline/cmdline_socket.c
+++ b/lib/librte_cmdline/cmdline_socket.c
@@ -16,10 +16,6 @@
 #include "cmdline_private.h"
 #include "cmdline_socket.h"
 
-#ifdef RTE_EXEC_ENV_WINDOWS
-#define open _open
-#endif
-
 struct cmdline *
 cmdline_file_new(cmdline_parse_ctx_t *ctx, const char *prompt, const char 
*path)
 {
diff --git a/lib/librte_eal/common/eal_common_config.c 
b/lib/librte_eal/common/eal_common_config.c
index 56d09dda7f..1c4c4dd585 100644
--- a/lib/librte_eal/common/eal_common_config.c
+++ b/lib/librte_eal/common/eal_common_config.c
@@ -3,7 +3,6 @@
  */
 #include 
 
-#include 
 #include 
 
 #include "eal_private.h"
diff 

[dpdk-dev] [PATCH v7 4/5] net: work around s_addr macro on Windows

2021-04-03 Thread Dmitry Kozlyuk
Windows Sockets headers contain `#define s_addr S_un.S_addr`, which
conflicts with definition of `s_addr` field of `struct rte_ether_hdr`.
Prieviously `s_addr` was undefined in , which had been
breaking access to `s_addr` field of `struct in_addr`, so some DPDK
and Windows headers could not be included in one file.

Renaming of `struct rte_ether_hdr` is planned:
https://mails.dpdk.org/archives/dev/2021-March/201444.html

Temporarily disable `s_addr` macro around `struct rte_ether_hdr`
definition to avoid conflict. Place source MAC address in both `s_addr`
and `S_un.S_addr` fields, so that access works either directly or
through the macro as defined in Windows headers.

Signed-off-by: Dmitry Kozlyuk 
Acked-by: Ranjit Menon 
---
 lib/librte_net/rte_ether.h | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
index 060b63fc9b..a303c24a8c 100644
--- a/lib/librte_net/rte_ether.h
+++ b/lib/librte_net/rte_ether.h
@@ -23,10 +23,6 @@ extern "C" {
 #include 
 #include 
 
-#ifdef RTE_EXEC_ENV_WINDOWS /* Workaround conflict with rte_ether_hdr. */
-#undef s_addr /* Defined in winsock2.h included in windows.h. */
-#endif
-
 #define RTE_ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define RTE_ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
 #define RTE_ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
@@ -257,16 +253,34 @@ __rte_experimental
 int
 rte_ether_unformat_addr(const char *str, struct rte_ether_addr *eth_addr);
 
+/* Windows Sockets headers contain `#define s_addr S_un.S_addr`.
+ * Temporarily disable this macro to avoid conflict at definition.
+ * Place source MAC address in both `s_addr` and `S_un.S_addr` fields,
+ * so that access works either directly or through the macro.
+ */
+#pragma push_macro("s_addr")
+#ifdef s_addr
+#undef s_addr
+#endif
+
 /**
  * Ethernet header: Contains the destination address, source address
  * and frame type.
  */
 struct rte_ether_hdr {
struct rte_ether_addr d_addr; /**< Destination address. */
-   struct rte_ether_addr s_addr; /**< Source address. */
-   uint16_t ether_type;  /**< Frame type. */
+   RTE_STD_C11
+   union {
+   struct rte_ether_addr s_addr; /**< Source address. */
+   struct {
+   struct rte_ether_addr S_addr;
+   } S_un; /**< Do not use directly; use s_addr instead.*/
+   };
+   uint16_t ether_type; /**< Frame type. */
 } __rte_aligned(2);
 
+#pragma pop_macro("s_addr")
+
 /**
  * Ethernet VLAN Header.
  * Contains the 16-bit VLAN Tag Control Identifier and the Ethernet type
-- 
2.29.3



[dpdk-dev] [PATCH v7 5/5] net: provide IP-related API on any OS

2021-04-03 Thread Dmitry Kozlyuk
Users of  relied on it to provide IP-related defines,
like IPPROTO_* constants, but still had to include POSIX headers
for inet_pton() and other standard IP-related facilities.

Extend  so that it is a single header to gain access
to IP-related facilities on any OS. Use it to replace POSIX includes
in components enabled on Windows. Move missing constants from Windows
networking shim to OS shim header and include it where needed.

Remove Windows networking shim that is no longer needed.

Signed-off-by: Dmitry Kozlyuk 
---
 drivers/net/i40e/i40e_fdir.c |  1 +
 drivers/net/mlx5/mlx5.h  |  1 -
 drivers/net/mlx5/mlx5_flow.c |  4 +--
 drivers/net/mlx5/mlx5_flow.h |  3 +-
 drivers/net/mlx5/mlx5_mac.c  |  1 -
 examples/cmdline/commands.c  |  5 ---
 examples/cmdline/parse_obj_list.c|  2 --
 lib/librte_cmdline/cmdline.c |  1 -
 lib/librte_cmdline/cmdline_parse.c   |  2 --
 lib/librte_cmdline/cmdline_parse_etheraddr.c |  6 
 lib/librte_cmdline/cmdline_parse_ipaddr.c|  6 
 lib/librte_cmdline/cmdline_parse_ipaddr.h|  2 +-
 lib/librte_eal/windows/include/arpa/inet.h   | 30 
 lib/librte_eal/windows/include/netinet/in.h  | 38 
 lib/librte_eal/windows/include/netinet/ip.h  | 10 --
 lib/librte_eal/windows/include/rte_os_shim.h |  8 +
 lib/librte_eal/windows/include/sys/socket.h  | 24 -
 lib/librte_ethdev/rte_ethdev.c   | 12 +++
 lib/librte_ethdev/rte_ethdev_core.h  |  1 -
 lib/librte_net/rte_ip.h  |  7 
 lib/librte_net/rte_net.c |  1 +
 21 files changed, 24 insertions(+), 141 deletions(-)
 delete mode 100644 lib/librte_eal/windows/include/arpa/inet.h
 delete mode 100644 lib/librte_eal/windows/include/netinet/in.h
 delete mode 100644 lib/librte_eal/windows/include/netinet/ip.h
 delete mode 100644 lib/librte_eal/windows/include/sys/socket.h

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index c572d003cb..e7361bf520 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "i40e_logs.h"
 #include "base/i40e_type.h"
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6faba4fbb1..392e89d3f5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c347f8130e..0f1a9c5ed9 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -3,12 +3,11 @@
  * Copyright 2016 Mellanox Technologies, Ltd
  */
 
-#include 
-#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -8241,4 +8240,3 @@ mlx5_release_tunnel_hub(__rte_unused struct 
mlx5_dev_ctx_shared *sh,
 {
 }
 #endif /* HAVE_IBV_FLOW_DV_SUPPORT */
-
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 8324e188e1..d03eb0a7a7 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -5,11 +5,10 @@
 #ifndef RTE_PMD_MLX5_FLOW_H_
 #define RTE_PMD_MLX5_FLOW_H_
 
-#include 
-#include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index a7946f7756..19981d26d8 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -8,7 +8,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/examples/cmdline/commands.c b/examples/cmdline/commands.c
index f43eacfbad..9ce8ef389f 100644
--- a/examples/cmdline/commands.c
+++ b/examples/cmdline/commands.c
@@ -8,12 +8,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
-#ifdef RTE_EXEC_ENV_FREEBSD
-#include 
-#endif
 
 #include 
 #include 
diff --git a/examples/cmdline/parse_obj_list.c 
b/examples/cmdline/parse_obj_list.c
index b04adbea58..959bcd1452 100644
--- a/examples/cmdline/parse_obj_list.c
+++ b/examples/cmdline/parse_obj_list.c
@@ -6,11 +6,9 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/lib/librte_cmdline/cmdline.c b/lib/librte_cmdline/cmdline.c
index 49770869bb..a176d15130 100644
--- a/lib/librte_cmdline/cmdline.c
+++ b/lib/librte_cmdline/cmdline.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
diff --git a/lib/librte_cmdline/cmdline_parse.c 
b/lib/librte_cmdline/cmdline_parse.c
index fe366841cd..f5cc934782 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -11,8 +11,6 @@
 #include 
 #include 
 
-#include 
-
 #include 
 
 #include "cmdline_private.h"
diff --git a/lib/librte_cmdline/cmdline_parse_etheraddr.c 
b/lib/librte_cmdline/cmdline_parse_etheraddr.c
index 5cb10de321..

Re: [dpdk-dev] [PATCH v2 2/2] qos: rearrange enqueue procedure

2021-04-03 Thread Ananyev, Konstantin


Hi guys,

> > > In many usage scenarios input mbufs for rte_sched_port_enqueue() are
> > not
> > > yet in the CPU cache(s). That causes quite significant stalls due to 
> > > memory
> > > latency. Current implementation tries to migitate it using SW pipeline and
> > SW
> > > prefetch techniques, but stalls are still present.
> > > Rework rte_sched_port_enqueue() to do actual fetch of all mbufs
> > metadata
> > > as a first stage of that function.
> > > That helps to minimise load stalls at further stages of enqueue() and
> > > improves overall enqueue performance.
> > > With examples/qos_sched I observed:
> > > on ICX box: up to 30% cycles reduction
> > > on CSX AND BDX: 20-15% cycles reduction
> > > I also run tests with mbufs already in the cache (one core doing RX, QOS
> > and
> > > TX).
> > > With such scenario, on all mentioned above IA boxes no performance drop
> > > was observed.
> > >
> > > Signed-off-by: Konstantin Ananyev 
> > > ---
> > > v2: fix clang and checkpatch complains
> > > ---
> > >  lib/librte_sched/rte_sched.c | 219 +--
> > >  1 file changed, 31 insertions(+), 188 deletions(-)
> > >
> > > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
> > index
> > > 7c5688068..41ef147e0 100644
> > > --- a/lib/librte_sched/rte_sched.c
> > > +++ b/lib/librte_sched/rte_sched.c
> > > @@ -1861,24 +1861,23 @@ debug_check_queue_slab(struct
> > > rte_sched_subport *subport, uint32_t bmp_pos,  #endif /*
> > > RTE_SCHED_DEBUG */
> > >
> > >  static inline struct rte_sched_subport * -rte_sched_port_subport(struct
> > > rte_sched_port *port,
> > > - struct rte_mbuf *pkt)
> > > +sched_port_subport(const struct rte_sched_port *port, struct
> > > +rte_mbuf_sched sch)
> > >  {
> > > - uint32_t queue_id = rte_mbuf_sched_queue_get(pkt);
> > > + uint32_t queue_id = sch.queue_id;
> > >   uint32_t subport_id = queue_id >> (port-
> > > >n_pipes_per_subport_log2 + 4);
> > >
> > >   return port->subports[subport_id];
> > >  }
> > >
> > >  static inline uint32_t
> > > -rte_sched_port_enqueue_qptrs_prefetch0(struct rte_sched_subport
> > > *subport,
> > > - struct rte_mbuf *pkt, uint32_t subport_qmask)
> > > +sched_port_enqueue_qptrs_prefetch0(const struct rte_sched_subport
> > > *subport,
> > > + struct rte_mbuf_sched sch, uint32_t subport_qmask)
> > >  {
> > >   struct rte_sched_queue *q;
> > >  #ifdef RTE_SCHED_COLLECT_STATS
> > >   struct rte_sched_queue_extra *qe;
> > >  #endif
> > > - uint32_t qindex = rte_mbuf_sched_queue_get(pkt);
> > > + uint32_t qindex = sch.queue_id;
> > >   uint32_t subport_queue_id = subport_qmask & qindex;
> > >
> > >   q = subport->queue + subport_queue_id; @@ -1971,197 +1970,41
> > > @@ int  rte_sched_port_enqueue(struct rte_sched_port *port, struct
> > > rte_mbuf **pkts,
> > >  uint32_t n_pkts)
> > >  {
> > > - struct rte_mbuf *pkt00, *pkt01, *pkt10, *pkt11, *pkt20, *pkt21,
> > > - *pkt30, *pkt31, *pkt_last;
> > > - struct rte_mbuf **q00_base, **q01_base, **q10_base,
> > > **q11_base,
> > > - **q20_base, **q21_base, **q30_base, **q31_base,
> > > **q_last_base;
> > > - struct rte_sched_subport *subport00, *subport01, *subport10,
> > > *subport11,
> > > - *subport20, *subport21, *subport30, *subport31,
> > > *subport_last;
> > > - uint32_t q00, q01, q10, q11, q20, q21, q30, q31, q_last;
> > > - uint32_t r00, r01, r10, r11, r20, r21, r30, r31, r_last;
> > > - uint32_t subport_qmask;
> > >   uint32_t result, i;
> > > + struct rte_mbuf_sched sch[n_pkts];
> > > + struct rte_sched_subport *subports[n_pkts];
> > > + struct rte_mbuf **q_base[n_pkts];
> > > + uint32_t q[n_pkts];
> > > +
> > > + const uint32_t subport_qmask =
> > > + (1 << (port->n_pipes_per_subport_log2 + 4)) - 1;
> > >
> > >   result = 0;
> > > - subport_qmask = (1 << (port->n_pipes_per_subport_log2 + 4)) - 1;
> > >
> > > - /*
> > > -  * Less then 6 input packets available, which is not enough to
> > > -  * feed the pipeline
> > > -  */
> > > - if (unlikely(n_pkts < 6)) {
> > > - struct rte_sched_subport *subports[5];
> > > - struct rte_mbuf **q_base[5];
> > > - uint32_t q[5];
> > > -
> > > - /* Prefetch the mbuf structure of each packet */
> > > - for (i = 0; i < n_pkts; i++)
> > > - rte_prefetch0(pkts[i]);
> > > -
> > > - /* Prefetch the subport structure for each packet */
> > > - for (i = 0; i < n_pkts; i++)
> > > - subports[i] = rte_sched_port_subport(port, pkts[i]);
> > > -
> > > - /* Prefetch the queue structure for each queue */
> > > - for (i = 0; i < n_pkts; i++)
> > > - q[i] =
> > > rte_sched_port_enqueue_qptrs_prefetch0(subports[i],
> > > - pkts[i], subport_qmask);
> > > -
> > > - /* Prefetch the write pointer location of each queue */
> > > - for (i = 0; i < n_pkts; i++) {
> > > - q_base[i] =
> > > rte_sched_subport_pipe_qbase(subports[

Re: [dpdk-dev] [PATCH] build: fix symlink of drivers for Windows

2021-04-03 Thread Dmitry Kozlyuk
2021-04-01 13:27 (UTC+0100), Nick Connolly:
[...]
> +def copy_pmd_files(pattern, to_dir):
> + for file in glob.glob(os.path.join(pmd_dir, pattern)):
> + to = os.path.join(to_dir, os.path.basename(file))
> + shutil.copy2(file, to)
> + print(to + ' -> ' + file)
> +
> +copy_pmd_files('*rte_*.dll', bin_dir)
> +copy_pmd_files('*rte_*.pdb', bin_dir)

PDB (debuginfo) files can be quite large, do we want to install them?

[...]
> diff --git a/config/meson.build b/config/meson.build
> index 66a2edcc4..c51669b7d 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -57,11 +57,8 @@ eal_pmd_path = join_paths(get_option('prefix'), 
> driver_install_path)
>  # driver .so files often depend upon the bus drivers for their connect bus,
>  # e.g. ixgbe depends on librte_bus_pci. This means that the bus drivers need
>  # to be in the library path, so symlink the drivers from the main lib 
> directory.
> -if not is_windows
> - meson.add_install_script('../buildtools/symlink-drivers-solibs.sh',
> - get_option('libdir'),
> - pmd_subdir_opt)
> -endif
> +meson.add_install_script(py3, '../buildtools/symlink-drivers-solibs.py',
> + get_option('libdir'), pmd_subdir_opt, get_option('bindir'))
>  
>  # set the machine type and cflags for it
>  if meson.is_cross_build()

As you may have seen, build fails because find_program() result cannot be
used in meson.add_install_script() until 0.55. Since your script has
Unix-specific part anyway and Windows recommends Meson 0.56, maybe Unices
should continue using shell variant and Python script can be Windows-only.


Re: [dpdk-dev] [PATCH v3] meson: remove unnecessary explicit link to libpcap

2021-04-03 Thread Dmitry Kozlyuk
2021-03-26 09:22 (UTC+0100), Gabriel Ganne:
> libpcap is already found and registered as a dependency by meson, and
> the dependency is already correctly used in librte_port. This line is
> just unnecessary.
> 
> It also has the side effect of messing with the meson link line: dpdk
> link will be declared twice: manually and then through pkg-config. If
> you configure meson to prefer static linking over dynamic, this will
> cause the build to fail on librte_port, since the pcap deps are not yet
> seen by the linker.
> 
> Signed-off-by: Gabriel Ganne 
> ---
>  config/meson.build | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/config/meson.build b/config/meson.build
> index 66a2edcc47f5..95777cf33169 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -183,7 +183,6 @@ if not pcap_dep.found()
>  endif
>  if pcap_dep.found() and cc.has_header('pcap.h', dependencies: pcap_dep)
>   dpdk_conf.set('RTE_PORT_PCAP', 1)
> - dpdk_extra_ldflags += '-lpcap'
>  endif
>  
>  # for clang 32-bit compiles we need libatomic for 64-bit atomic ops

This patch also simplifies future changes to discover libpcap on Windows.

Acked-by: Dmitry Kozlyuk