date:20210920

Re: [dpdk-dev] [PATCH v2 0/4] delete HW rings when releasing queues for some drivers

2021-09-20 Thread David Marchand

On Sat, Sep 18, 2021 at 10:34 AM Yunjian Wang  wrote:
>
> This series for deleting HW rings when releasing queues for
> igb, ixgbe, i40e, ice & em drivers.
>
> ---
> v2:
>* Update commit log
>
> Yunjian Wang (4):
>   net/e1000: delete HW rings when releasing queues
>   net/ice: delete HW rings when releasing queues
>   net/i40e: delete HW rings when releasing queues
>   net/ixgbe: delete HW rings when releasing queues
>
>  drivers/net/e1000/em_rxtx.c| 8 ++--
>  drivers/net/e1000/igb_rxtx.c   | 9 +++--
>  drivers/net/i40e/i40e_fdir.c   | 3 ---
>  drivers/net/i40e/i40e_rxtx.c   | 8 ++--
>  drivers/net/i40e/i40e_rxtx.h   | 2 ++
>  drivers/net/ice/ice_rxtx.c | 6 --
>  drivers/net/ice/ice_rxtx.h | 2 ++
>  drivers/net/ixgbe/ixgbe_rxtx.c | 6 --
>  drivers/net/ixgbe/ixgbe_rxtx.h | 2 ++
>  9 files changed, 33 insertions(+), 13 deletions(-)
>

- In net/ice (at least), the fdir rxq/txq memzones can be aligned on
the same scheme.
Looking at the remaining drivers (net/cnxk, net/cxgbe and
net/octeontx2), we could apply the same principle of keeping a
reference to mz in internal driver structures.
Afterwards, I see no need to keep rte_eth_dma_zone_free() (it's
internal, so we can remove and it's easy to re-add if a need arises).

Wdyt?


- Is this worth backporting to stable branches?


-- 
David Marchand

Re: [dpdk-dev] [PATCH v4] build: add meson options of max_memseg_lists and atomic_mbuf_ref_counts

2021-09-20 Thread kefu chai

hello Bruce,

do you have any further concerns? is there anything i can do to move
this forward?

cheers,

On Thu, Sep 9, 2021 at 12:51 AM Kefu Chai  wrote:
>
> RTE_MAX_MEMSEG_LISTS = 128 is not enough for high-memory machines, in our
> case, we need to increase it to 8192. so add an option so user can
> override it. RTE_MBUF_REFCNT_ATOMIC = 0 is not necessary for applications
> like Seastar, where it's safe to assume that the mbuf refcnt is only
> updated by a single core only.
>
> ---
>
> v4:
>
> fix the coding style issue by reduce the line length to under 75.
> this change should silence the warning like:
>
> WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a 
> maximum 75 chars per line)
> #81:
> RTE_MAX_MEMSEG_LISTS = 128 is not enough for high-memory machines, in our 
> case,
>
> total: 0 errors, 1 warnings, 35 lines checked
>
> Signed-off-by: Kefu Chai 
> ---
>  config/meson.build  | 5 -
>  config/rte_config.h | 2 --
>  meson_options.txt   | 4 
>  3 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/config/meson.build b/config/meson.build
> index 3b5966ec2f..d95dccdbcc 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -301,7 +301,10 @@ if dpdk_conf.get('RTE_ARCH_64')
>  else # for 32-bit we need smaller reserved memory areas
>  dpdk_conf.set('RTE_MAX_MEM_MB', 2048)
>  endif
> -
> +dpdk_conf.set('RTE_MAX_MEMSEG_LISTS', get_option('max_memseg_lists'))
> +if get_option('atomic_mbuf_ref_counts')
> +  dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
> +endif
>
>  compile_time_cpuflags = []
>  subdir(arch_subdir)
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c07d..0a659f5e1a 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -29,7 +29,6 @@
>
>  /* EAL defines */
>  #define RTE_MAX_HEAPS 32
> -#define RTE_MAX_MEMSEG_LISTS 128
>  #define RTE_MAX_MEMSEG_PER_LIST 8192
>  #define RTE_MAX_MEM_MB_PER_LIST 32768
>  #define RTE_MAX_MEMSEG_PER_TYPE 32768
> @@ -50,7 +49,6 @@
>
>  /* mbuf defines */
>  #define RTE_MBUF_DEFAULT_MEMPOOL_OPS "ring_mp_mc"
> -#define RTE_MBUF_REFCNT_ATOMIC 1
>  #define RTE_PKTMBUF_HEADROOM 128
>
>  /* ether defines */
> diff --git a/meson_options.txt b/meson_options.txt
> index 0e92734c49..6aeae211cd 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -38,6 +38,10 @@ option('max_lcores', type: 'integer', value: 128, 
> description:
> 'maximum number of cores/threads supported by EAL')
>  option('max_numa_nodes', type: 'integer', value: 32, description:
> 'maximum number of NUMA nodes supported by EAL')
> +option('max_memseg_lists', type: 'integer', value: 128, description:
> +   'maximum number of dynamic arrays holding memsegs')
> +option('atomic_mbuf_ref_counts', type: 'boolean', value: true, description:
> +   'atomically access the mbuf refcnt')
>  option('platform', type: 'string', value: 'native', description:
> 'Platform to build, either "native", "generic" or a SoC. Please refer 
> to the Linux build guide for more information.')
>  option('enable_trace_fp', type: 'boolean', value: false, description:
> --
> 2.33.0
>


-- 
Regards
Kefu Chai

Re: [dpdk-dev] [PATCH v2 0/2] mlx5: support global device syntax

2021-09-20 Thread Thomas Monjalon

18/01/2021 16:26, Xueming Li:
> Xueming Li (2):
>   common/mlx5: support device global syntax
>   net/mlx5: support new global device syntax
> 
>  drivers/common/mlx5/mlx5_common_pci.c | 6 +-
>  drivers/net/mlx5/mlx5.c   | 6 +-
>  2 files changed, 10 insertions(+), 2 deletions(-)

Please could you rebase this series?

Re: [dpdk-dev] [PATCH v4] build: add meson options of max_memseg_lists and atomic_mbuf_ref_counts

2021-09-20 Thread Bruce Richardson

On Mon, Sep 20, 2021 at 03:51:06PM +0800, kefu chai wrote:
> hello Bruce,
> 
> do you have any further concerns? is there anything i can do to move
> this forward?
> 
> cheers,
>

+Anatoly, for his input for the memory segments change.

I still would prefer not to have these as config options, but perhaps one
or both needs to be. The atomic refcount seems more reasonable to add of
the two. For the max memseg lists, what is the impact if we were to
increase this value globally?

/Bruce
 
> On Thu, Sep 9, 2021 at 12:51 AM Kefu Chai  wrote:
> >
> > RTE_MAX_MEMSEG_LISTS = 128 is not enough for high-memory machines, in our
> > case, we need to increase it to 8192. so add an option so user can
> > override it. RTE_MBUF_REFCNT_ATOMIC = 0 is not necessary for applications
> > like Seastar, where it's safe to assume that the mbuf refcnt is only
> > updated by a single core only.
> >
> > ---
> >
> > v4:
> >
> > fix the coding style issue by reduce the line length to under 75.
> > this change should silence the warning like:
> >
> > WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer 
> > a maximum 75 chars per line)
> > #81:
> > RTE_MAX_MEMSEG_LISTS = 128 is not enough for high-memory machines, in our 
> > case,
> >
> > total: 0 errors, 1 warnings, 35 lines checked
> >
> > Signed-off-by: Kefu Chai 
> > ---
> >  config/meson.build  | 5 -
> >  config/rte_config.h | 2 --
> >  meson_options.txt   | 4 
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> >

Re: [dpdk-dev] [PATCH 2/3] test/latencystats: fix incorrect loop boundary

2021-09-20 Thread Pattan, Reshma




> -Original Message-
> From: David Marchand 
> Caught running ASAN.
> 
> Signed-off-by: David Marchand 

Acked-by:  Reshma Pattan

Re: [dpdk-dev] [PATCH v2 1/2] common/cnxk: update roc models

2021-09-20 Thread Jerin Jacob

On Fri, Sep 17, 2021 at 3:06 PM Ashwin Sekhar T K  wrote:
>
> Make following updates to roc models.
>  - Use consistent upper/lower case in macros defining different
>ROC models.
>  - Add api to detect cn96 Cx stepping.
>  - Make all current cn10k models as A0 stepping.
>
> Signed-off-by: Ashwin Sekhar T K 


Series Acked-by: Jerin Jacob 
Series applied to dpdk-next-net-mrvl/for-next-net. Thanks.


> ---
>  drivers/common/cnxk/roc_model.c | 51 +++
>  drivers/common/cnxk/roc_model.h | 53 +
>  2 files changed, 67 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/common/cnxk/roc_model.c b/drivers/common/cnxk/roc_model.c
> index bc255b53cc..e5aeabe2e2 100644
> --- a/drivers/common/cnxk/roc_model.c
> +++ b/drivers/common/cnxk/roc_model.c
> @@ -13,14 +13,14 @@ struct roc_model *roc_model;
>
>  #define SOC_PART_CN10K 0xD49
>
> -#define PART_106XX  0xB9
> -#define PART_105XX  0xBA
> -#define PART_105XXN 0xBC
> -#define PART_98XX   0xB1
> -#define PART_96XX   0xB2
> -#define PART_95XX   0xB3
> -#define PART_95XXN  0xB4
> -#define PART_95XXMM 0xB5
> +#define PART_106xx  0xB9
> +#define PART_105xx  0xBA
> +#define PART_105xxN 0xBC
> +#define PART_98xx   0xB1
> +#define PART_96xx   0xB2
> +#define PART_95xx   0xB3
> +#define PART_95xxN  0xB4
> +#define PART_95xxMM 0xB5
>  #define PART_95O0xB6
>
>  #define MODEL_IMPL_BITS  8
> @@ -44,20 +44,21 @@ static const struct model_db {
> uint64_t flag;
> char name[ROC_MODEL_STR_LEN_MAX];
>  } model_db[] = {
> -   {VENDOR_ARM, PART_106XX, 0, 0, ROC_MODEL_CN106XX, "cn10ka"},
> -   {VENDOR_ARM, PART_105XX, 0, 0, ROC_MODEL_CNF105XX, "cnf10ka"},
> -   {VENDOR_ARM, PART_105XXN, 0, 0, ROC_MODEL_CNF105XXN, "cnf10kb"},
> -   {VENDOR_CAVIUM, PART_98XX, 0, 0, ROC_MODEL_CN98xx_A0, "cn98xx_a0"},
> -   {VENDOR_CAVIUM, PART_96XX, 0, 0, ROC_MODEL_CN96xx_A0, "cn96xx_a0"},
> -   {VENDOR_CAVIUM, PART_96XX, 0, 1, ROC_MODEL_CN96xx_B0, "cn96xx_b0"},
> -   {VENDOR_CAVIUM, PART_96XX, 2, 0, ROC_MODEL_CN96xx_C0, "cn96xx_c0"},
> -   {VENDOR_CAVIUM, PART_95XX, 0, 0, ROC_MODEL_CNF95xx_A0, "cnf95xx_a0"},
> -   {VENDOR_CAVIUM, PART_95XX, 1, 0, ROC_MODEL_CNF95xx_B0, "cnf95xx_b0"},
> -   {VENDOR_CAVIUM, PART_95XXN, 0, 0, ROC_MODEL_CNF95XXN_A0, 
> "cnf95xxn_a0"},
> -   {VENDOR_CAVIUM, PART_95O, 0, 0, ROC_MODEL_CNF95XXO_A0, "cnf95O_a0"},
> -   {VENDOR_CAVIUM, PART_95XXMM, 0, 0, ROC_MODEL_CNF95XXMM_A0,
> -"cnf95xxmm_a0"}
> -};
> +   {VENDOR_ARM, PART_106xx, 0, 0, ROC_MODEL_CN106xx_A0, "cn10ka_a0"},
> +   {VENDOR_ARM, PART_105xx, 0, 0, ROC_MODEL_CNF105xx_A0, "cnf10ka_a0"},
> +   {VENDOR_ARM, PART_105xxN, 0, 0, ROC_MODEL_CNF105xxN_A0, "cnf10kb_a0"},
> +   {VENDOR_CAVIUM, PART_98xx, 0, 0, ROC_MODEL_CN98xx_A0, "cn98xx_a0"},
> +   {VENDOR_CAVIUM, PART_96xx, 0, 0, ROC_MODEL_CN96xx_A0, "cn96xx_a0"},
> +   {VENDOR_CAVIUM, PART_96xx, 0, 1, ROC_MODEL_CN96xx_B0, "cn96xx_b0"},
> +   {VENDOR_CAVIUM, PART_96xx, 2, 0, ROC_MODEL_CN96xx_C0, "cn96xx_c0"},
> +   {VENDOR_CAVIUM, PART_96xx, 2, 1, ROC_MODEL_CN96xx_C0, "cn96xx_c1"},
> +   {VENDOR_CAVIUM, PART_95xx, 0, 0, ROC_MODEL_CNF95xx_A0, "cnf95xx_a0"},
> +   {VENDOR_CAVIUM, PART_95xx, 1, 0, ROC_MODEL_CNF95xx_B0, "cnf95xx_b0"},
> +   {VENDOR_CAVIUM, PART_95xxN, 0, 0, ROC_MODEL_CNF95xxN_A0, 
> "cnf95xxn_a0"},
> +   {VENDOR_CAVIUM, PART_95xxN, 0, 1, ROC_MODEL_CNF95xxN_A0, 
> "cnf95xxn_a1"},
> +   {VENDOR_CAVIUM, PART_95O, 0, 0, ROC_MODEL_CNF95xxO_A0, "cnf95O_a0"},
> +   {VENDOR_CAVIUM, PART_95xxMM, 0, 0, ROC_MODEL_CNF95xxMM_A0,
> +"cnf95xxmm_a0"}};
>
>  static uint32_t
>  cn10k_part_get(void)
> @@ -85,11 +86,11 @@ cn10k_part_get(void)
> }
> ptr++;
> if (strcmp("cn10ka", ptr) == 0) {
> -   soc = PART_106XX;
> +   soc = PART_106xx;
> } else if (strcmp("cnf10ka", ptr) == 0) {
> -   soc = PART_105XX;
> +   soc = PART_105xx;
> } else if (strcmp("cnf10kb", ptr) == 0) {
> -   soc = PART_105XXN;
> +   soc = PART_105xxN;
> } else {
> plt_err("Unidentified 'CPU compatible': <%s>", ptr);
> goto fclose;
> diff --git a/drivers/common/cnxk/roc_model.h b/drivers/common/cnxk/roc_model.h
> index c1d11b77c6..a54f435b46 100644
> --- a/drivers/common/cnxk/roc_model.h
> +++ b/drivers/common/cnxk/roc_model.h
> @@ -15,13 +15,14 @@ struct roc_model {
>  #define ROC_MODEL_CN96xx_C0BIT_ULL(2)
>  #define ROC_MODEL_CNF95xx_A0   BIT_ULL(4)
>  #define ROC_MODEL_CNF95xx_B0   BIT_ULL(6)
> -#define ROC_MODEL_CNF95XXMM_A0 BIT_ULL(8)
> -#define ROC_MODEL_CNF95XXN_A0  BIT_ULL(12)
> -#define ROC_MODEL_CNF95XXO_A0  BIT_ULL(13)
> +#define ROC_MODEL_CNF95xxMM_A0 BIT_ULL(8)
> +#define ROC_MODEL_CNF95xxN_A0  BIT_ULL(12)
> +#define ROC_MODEL_CNF95xxO_A0  BIT_ULL(13)
> +#define ROC_MODEL_CNF95xxN_A1  BIT_ULL(14)
>  #define ROC_MODEL_CN98xx_A0BIT_ULL(16)
>

Re: [dpdk-dev] [PATCH v2 0/8] Add TM Support for CN9K and CN10K

2021-09-20 Thread nithind1988


Acked-by: Nithin Dabilpuram 

On 9/18/21 8:01 PM, skotesh...@marvell.com wrote:

From: Satha Rao 

Initial implementation of traffic management for CN9K and CN10K
platforms.

Nithin Dabilpuram (1):
   common/cnxk: increase sched weight and shaper burst limit

Satha Rao (7):
   common/cnxk: use different macros for sdp and lbk max frames
   common/cnxk: flush smq
   common/cnxk: handle packet mode shaper limits
   common/cnxk: handler to get rte tm error type
   common/cnxk: set of handlers to get tm hierarchy internals
   net/cnxk: tm capabilities and queue rate limit handlers
   net/cnxk: tm shaper and node operations

v2:

- Added cover letter
- fixed meson warnings
- updated release notes

  doc/guides/rel_notes/release_21_11.rst |   1 +
  drivers/common/cnxk/cnxk_utils.c   |  68 
  drivers/common/cnxk/cnxk_utils.h   |  11 +
  drivers/common/cnxk/hw/nix.h   |  23 +-
  drivers/common/cnxk/meson.build|   5 +
  drivers/common/cnxk/roc_model.h|   6 +
  drivers/common/cnxk/roc_nix.c  |   5 +-
  drivers/common/cnxk/roc_nix.h  |  34 +-
  drivers/common/cnxk/roc_nix_priv.h |  13 +-
  drivers/common/cnxk/roc_nix_tm.c   |  24 +-
  drivers/common/cnxk/roc_nix_tm_ops.c   | 147 +--
  drivers/common/cnxk/roc_nix_tm_utils.c | 130 ++-
  drivers/common/cnxk/roc_utils.c|   6 +
  drivers/common/cnxk/version.map|  10 +
  drivers/net/cnxk/cnxk_ethdev.c |   2 +
  drivers/net/cnxk/cnxk_ethdev.h |   3 +
  drivers/net/cnxk/cnxk_tm.c | 675 +
  drivers/net/cnxk/cnxk_tm.h |  23 ++
  drivers/net/cnxk/meson.build   |   1 +
  19 files changed, 1121 insertions(+), 66 deletions(-)
  create mode 100644 drivers/common/cnxk/cnxk_utils.c
  create mode 100644 drivers/common/cnxk/cnxk_utils.h
  create mode 100644 drivers/net/cnxk/cnxk_tm.c
  create mode 100644 drivers/net/cnxk/cnxk_tm.h

Re: [dpdk-dev] [EXT] Re: [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ferruh Yigit

On 9/17/2021 5:15 PM, Thomas Monjalon wrote:
> 17/09/2021 16:53, Ashwin Sekhar Thalakalath Kottilveetil:
>> From: Thomas Monjalon 
>>> 17/09/2021 15:54, Ashwin Sekhar Thalakalath Kottilveetil:
 From: Thomas Monjalon 
> 17/09/2021 12:58, Ashwin Sekhar T K:
>> Update word list with Marvell specific acronyms.
> [...]
>>> Please add details in the commit log so we understand they are Marvell
>>> acronyms.
>> Commit log already mentions these are Marvell specific acronyms. I did not
>> add explanation for each of them as this would make the message too long.
> 
> Oh yes, I missed it, sorry.
> 
>>> One more question: why is useful to add? Some people forget uppercases?
>>
>> Upper case is desired but not really mandatory. This was a suggestion put
>> forth to me In one of the reviews.
>> https://patches.dpdk.org/project/dpdk/patch/20210830135231.2610152-1-asek...@marvell.com/
>>
>> I can abandon this change if you feel it is not appropriate to put many
>> device specific acronyms in the top level word list.
> 
> No strong opinion, but I think the patch is OK.
> David, Ferruh, opinions?
> 

Yes this is suggested to be sure acronyms are uppercase in the patch title.

But if an issue can be described in generic concepts, I am for using them to
instead of using device specific acronyms, to make commit logs less cryptic.
Like 'NIC' should be used instead of 'NIX'.

Similarly we can try to use long version of CQ/SQ/RQ, although we may need to
use them because of limited title length time to time.

Rest seems device specific abbreviations we may not escape to use them, so they
are OK to me.

Re: [dpdk-dev] [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ferruh Yigit

On 9/17/2021 11:58 AM, Ashwin Sekhar T K wrote:
> Update word list with Marvell specific acronyms.
> 
> CPT  -> Cryptographic Accelerator Unit
> CQ   -> Completion Queue
> LBK  -> Loopback Interface Unit
> LMT  -> Large Atomic Store Unit
> MCAM -> Match Content Addressable Memory
> NIX  -> Network Interface Controller Unit
> NPA  -> Network Pool Allocator
> NPC  -> Network Parser and CAM Unit
> ROC  -> Rest Of Chip

Out of curiosity, what is "rest of chip"?

> RQ   -> Receive Queue
> RVU  -> Resource Virtualization Unit
> SQ   -> Send Queue
> SSO  -> Schedule Synchronize Order Unit
> TIM  -> Timer Unit
> 
> Signed-off-by: Ashwin Sekhar T K

Re: [dpdk-dev] [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Jerin Jacob

On Mon, Sep 20, 2021 at 2:37 PM Ferruh Yigit  wrote:
>
> On 9/17/2021 11:58 AM, Ashwin Sekhar T K wrote:
> > Update word list with Marvell specific acronyms.
> >
> > CPT  -> Cryptographic Accelerator Unit
> > CQ   -> Completion Queue
> > LBK  -> Loopback Interface Unit
> > LMT  -> Large Atomic Store Unit
> > MCAM -> Match Content Addressable Memory
> > NIX  -> Network Interface Controller Unit
> > NPA  -> Network Pool Allocator
> > NPC  -> Network Parser and CAM Unit
> > ROC  -> Rest Of Chip
>
> Out of curiosity, what is "rest of chip"?

All the HW accelerators excluding the CPU cores.

>
> > RQ   -> Receive Queue
> > RVU  -> Resource Virtualization Unit
> > SQ   -> Send Queue
> > SSO  -> Schedule Synchronize Order Unit
> > TIM  -> Timer Unit
> >
> > Signed-off-by: Ashwin Sekhar T K 
>

Re: [dpdk-dev] [PATCH v5 1/2] eventdev: add rx queue conf get api

2021-09-20 Thread Kundapura, Ganapati

Hi Jerin,

> -Original Message-
> From: Jerin Jacob 
> Sent: 20 September 2021 12:00
> To: Kundapura, Ganapati 
> Cc: Jayatheerthan, Jay ; dpdk-dev
> ; Yigit, Ferruh 
> Subject: Re: [PATCH v5 1/2] eventdev: add rx queue conf get api
> 
> On Thu, Sep 16, 2021 at 6:21 PM Ganapati Kundapura
>  wrote:
> >
> > Added rte_event_eth_rx_adapter_queue_conf_get() API to get rx queue
> > information - event queue identifier, flags for handling received
> > packets, scheduler type, event priority, polling frequency of the
> > receive queue and flow identifier in
> > rte_event_eth_rx_adapter_queue_conf structure
> >
> > Signed-off-by: Ganapati Kundapura 
> >
> > ---
> > v5:
> > * Filled queue_conf after memzone lookup
> > * PMD callback if not NULL, invoked to override queue_conf struct
> > * Added memzone lookup for stats_get(), stats_reset(), service_id_get()
> >   api's called by secondary applications.
> >
> > v4:
> > * squashed 1/3 and 3/3
> > * reused rte_event_eth_rx_adapter_queue_conf structure in place of
> >   rte_event_eth_rx_adapter_queue_info
> > * renamed to rte_event_eth_rx_adapter_queue_info_get() to
> >   rte_event_eth_rx_adapter_queue_conf_get to align with
> >   rte_event_eth_rx_adapter_queue_conf structure
> >
> > v3:
> > * Split single patch into implementaion, test and document updation
> >   patches separately
> >
> > v2:
> > * Fixed build issue due to missing entry in version.map
> >
> > v1:
> > * Initial patch with implementaion, test and doc together
> > ---
> > ---
> >  .../prog_guide/event_ethernet_rx_adapter.rst   |  8 ++
> >  lib/eventdev/eventdev_pmd.h| 28 +++
> >  lib/eventdev/rte_event_eth_rx_adapter.c| 91
> +-
> >  lib/eventdev/rte_event_eth_rx_adapter.h| 27 +++
> >  lib/eventdev/version.map   |  1 +
> >  5 files changed, 154 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > index 0780b6f..ce23d8a 100644
> > --- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > +++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
> > @@ -146,6 +146,14 @@ if the callback is supported, and the counts
> > maintained by the service function,  if one exists. The service
> > function also maintains a count of cycles for which  it was not able to
> enqueue to the event device.
> >
> > +Getting Adapter queue config
> > +
> > +
> > +The  ``rte_event_eth_rx_adapter_queue_conf_get()`` function reports
> > +flags for handling received packets, event queue identifier,
> > +scheduler type, event priority, polling frequency of the receive
> > +queue and flow identifier in struct
> ``rte_event_eth_rx_adapter_queue_conf``.
> > +
> >  Interrupt Based Rx Queues
> >  ~~
> >
> > diff --git a/lib/eventdev/eventdev_pmd.h
> b/lib/eventdev/eventdev_pmd.h
> > index 63b3bc4..e69644b 100644
> > --- a/lib/eventdev/eventdev_pmd.h
> > +++ b/lib/eventdev/eventdev_pmd.h
> > @@ -562,6 +562,32 @@ typedef int
> (*eventdev_eth_rx_adapter_queue_del_t)
> > int32_t rx_queue_id);
> >
> >  /**
> > + * Retrieve Rx adapter queue config information for the specified
> > + * rx queue ID.
> > + *
> > + * @param dev
> > + *  Event device pointer
> > + *
> > + * @param eth_dev
> > + *  Ethernet device pointer
> > + *
> > + * @param rx_queue_id
> > + *  Ethernet device receive queue index.
> > + *
> > + * @param[out] queue_conf
> > + *  Pointer to rte_event_eth_rx_adapter_queue_conf structure
> > + *
> > + * @return
> > + *  - 0: Success
> > + *  - <0: Error code on failure.
> > + */
> > +typedef int (*eventdev_eth_rx_adapter_queue_conf_get_t)
> > +   (const struct rte_eventdev *dev,
> > +   const struct rte_eth_dev *eth_dev,
> > +   uint16_t rx_queue_id,
> > +   struct rte_event_eth_rx_adapter_queue_conf
> > +*queue_conf);
> > +
> > +/**
> >   * Start ethernet Rx adapter. This callback is invoked if
> >   * the caps returned from eventdev_eth_rx_adapter_caps_get(..,
> eth_port_id)
> >   * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set and Rx
> queues
> > @@ -1081,6 +1107,8 @@ struct rte_eventdev_ops {
> > /**< Add Rx queues to ethernet Rx adapter */
> > eventdev_eth_rx_adapter_queue_del_t eth_rx_adapter_queue_del;
> > /**< Delete Rx queues from ethernet Rx adapter */
> > +   eventdev_eth_rx_adapter_queue_conf_get_t
> eth_rx_adapter_queue_conf_get;
> > +   /**< Get Rx adapter queue info */
> > eventdev_eth_rx_adapter_start_t eth_rx_adapter_start;
> > /**< Start ethernet Rx adapter */
> > eventdev_eth_rx_adapter_stop_t eth_rx_adapter_stop; diff --git
> > a/lib/eventdev/rte_event_eth_rx_adapter.c
> > b/lib/eventdev/rte_event_eth_rx_adapter.c
> > index f2dc695..6cc4210 100644
> > --- a/lib/eve

Re: [dpdk-dev] [EXT] Re: [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ashwin Sekhar Thalakalath Kottilveetil

> > Oh yes, I missed it, sorry.
> >
> >>> One more question: why is useful to add? Some people forget
> uppercases?
> >>
> >> Upper case is desired but not really mandatory. This was a suggestion
> >> put forth to me In one of the reviews.
> >> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patches.dpdk.org
> >> _project_dpdk_patch_20210830135231.2610152-2D1-2Dasekhar-
> 40marvell.co
> >> m_&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=pYk-QOhvnkU-
> _75y0NKSn535ZotEGI
> >> _E69Py3Ppondk&m=tCLT4AyWr6-
> VYmqkdbD879kj0uDFhCqF6jjOWfe8Dn4&s=EWthslG
> >> Cy_OWH4bqcOEKKkweFTe4yHZ-2O5yqiKp39w&e=
> >>
> >> I can abandon this change if you feel it is not appropriate to put
> >> many device specific acronyms in the top level word list.
> >
> > No strong opinion, but I think the patch is OK.
> > David, Ferruh, opinions?
> >
> 
> Yes this is suggested to be sure acronyms are uppercase in the patch title.
> 
> But if an issue can be described in generic concepts, I am for using them to
> instead of using device specific acronyms, to make commit logs less cryptic.
Agree that certain commit logs could be re-worded and put in more
generic terms.

> Like 'NIC' should be used instead of 'NIX'.
But NIX and NIC are not exactly interchangeable. NIX refers to a co-processor.
NIX in conjunction with other co-processors (NPA, LMT etc.) delivers the
functionality of a NIC.

> 
> Similarly we can try to use long version of CQ/SQ/RQ, although we may need
> to use them because of limited title length time to time.
> 
> Rest seems device specific abbreviations we may not escape to use them, so
> they are OK to me.

Re: [dpdk-dev] [EXT] Re: [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ashwin Sekhar Thalakalath Kottilveetil

> On 9/17/2021 11:58 AM, Ashwin Sekhar T K wrote:
> > Update word list with Marvell specific acronyms.
> >
> > CPT  -> Cryptographic Accelerator Unit
> > CQ   -> Completion Queue
> > LBK  -> Loopback Interface Unit
> > LMT  -> Large Atomic Store Unit
> > MCAM -> Match Content Addressable Memory NIX  -> Network Interface
> > Controller Unit NPA  -> Network Pool Allocator NPC  -> Network Parser
> > and CAM Unit ROC  -> Rest Of Chip
> 
> Out of curiosity, what is "rest of chip"?
Anything which is non-CPU. Co-processors like NPA, NIX, etc. are collectively
referred to as ROC.
> 
> > RQ   -> Receive Queue
> > RVU  -> Resource Virtualization Unit
> > SQ   -> Send Queue
> > SSO  -> Schedule Synchronize Order Unit TIM  -> Timer Unit
> >
> > Signed-off-by: Ashwin Sekhar T K

Re: [dpdk-dev] [PATCH v4] net/mlx5: fix mutex unlock in txpp cleanup

2021-09-20 Thread Slava Ovsiienko

Hi, Chengfeng

Good catch, thank you.
Could we polish the commit message a bit?

"The lock sh->txpp.mutex was not correctly released on all pathes
of cleanup function return, potentially causing the deadlock."

With best regards,
Slava

> in these two branches, which may led to deadlock if the function was
> -Original Message-
> From: dev  On Behalf Of Chengfeng Ye
> Sent: Friday, September 3, 2021 11:44
> To: david.march...@redhat.com
> Cc: dev@dpdk.org; Chengfeng Ye ;
> sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH v4] net/mlx5: fix mutex unlock in txpp cleanup
> 
> The lock sh->txpp.mutex was not correctly released if the function returned
> in these two branches, which may led to deadlock if the function was
> acquired again.
> 
> Fixes: d133f4cdb7 ("net/mlx5: create clock queue for packet pacing")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Chengfeng Ye 
> ---
>  drivers/net/mlx5/mlx5_txpp.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
> index 4f6da9f2d1..0ece788a84 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -961,8 +961,12 @@ mlx5_txpp_stop(struct rte_eth_dev *dev)
>   MLX5_ASSERT(!ret);
>   RTE_SET_USED(ret);
>   MLX5_ASSERT(sh->txpp.refcnt);
> - if (!sh->txpp.refcnt || --sh->txpp.refcnt)
> + if (!sh->txpp.refcnt || --sh->txpp.refcnt) {
> + ret = pthread_mutex_unlock(&sh->txpp.mutex);
> + MLX5_ASSERT(!ret);
> + RTE_SET_USED(ret);
>   return;
> + }
>   /* No references any more, do actual destroy. */
>   mlx5_txpp_destroy(sh);
>   ret = pthread_mutex_unlock(&sh->txpp.mutex);
> --
> 2.17.1

Re: [dpdk-dev] [PATCH v2] eal: add additional info if lcore exceeds max cores

2021-09-20 Thread David Hunt


Hi David,

On 16/9/2021 1:34 PM, David Marchand wrote:

On Wed, Sep 15, 2021 at 2:11 PM David Hunt  wrote:

If the user requests to use an lcore above 128 using -l or -c,
the eal will exit with "EAL: invalid core list syntax" and
very little other useful information.

This patch adds some extra information suggesting to use --lcores
so that physical cores above RTE_MAX_LCORE (default 128) can be
used. This is achieved by using the --lcores option by mapping
the logical cores in the application onto to physical cores.

There is no change in functionalty, just additional messages
suggesting how the --lcores option might be used for the supplied
list of lcores. For example, if "-l 12-14,130,132" is used, we
see the following additional output on the command line:

EAL: Error = One of the 5 cores provided exceeds RTE_MAX_LCORE (128)
EAL: Please use --lcores instead, e.g. --lcores 0@12,1@13,2@14,3@130,4@132

Signed-off-by: David Hunt 

---
changes in v2
* Rather than increasing the default max lcores (as in v1),
  it was agreed to do this instead (switch to --lcores).
* As the other patches in the v1 of the set are no longer related
  to this change, I'll submit as a separate patch set.

The -c option can use the same kind of warning.



Agreed, I'll include in the next version.





---
  lib/eal/common/eal_common_options.c | 31 +
  1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index ff5861b5f3..5c7a5a45a5 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -836,6 +836,8 @@ eal_parse_service_corelist(const char *corelist)
 return 0;
  }

+#define MAX_LCORES_STRING 512
+
  static int
  eal_parse_corelist(const char *corelist, int *cores)
  {
@@ -843,6 +845,9 @@ eal_parse_corelist(const char *corelist, int *cores)
 char *end = NULL;
 int min, max;
 int idx;
+   bool overflow = false;
+   char lcores[MAX_LCORES_STRING] = "";

This code is not performance sensitive.
In the worst case, like for RTE_MAX_LCORES lcores, it gives this:
0@0,1@1,2@2,3@3,4@4,5@5,6@6,7@7,8@8,9@9,10@10,11@11,12@12,13@13,14@14,15@15,16@16,17@17,18@18,19@19,20@20,21@21,22@22,23@23,24@24,25@25,26@26,27@27,28@28,29@29,30@30,31@31,32@32,33@33,34@34,35@35,36@36,37@37,38@38,39@39,40@40,41@41,42@42,43@43,44@44,45@45,46@46,47@47,48@48,49@49,50@50,51@51,52@52,53@53,54@54,55@55,56@56,57@57,58@58,59@59,60@60,61@61,62@62,63@63,64@64,65@65,66@66,67@67,68@68,69@69,70@70,71@71,72@72,73@73,74@74,75@75,76@76,77@77,78@78,79@79,80@80,81@81,82@82,83@83,84@84,85@85,86@86,87@87,88@88,89@89,90@90,91@91,92@92,93@93,94@94,95@95,96@96,97@97,98@98,99@99,100@100,101@101,102@102,103@103,104@104,105@105,106@106,107@107,108@108,109@109,110@110,111@111,112@112,113@113,114@114,115@115,116@116,117@117,118@118,119@119,120@120,121@121,122@122,123@123,124@124,125@125,126@126,127@127,

Which is 800+ bytes long, let's switch do dynamic allocations.



Good point. I'll allocate a dozen bytes or so for each physical core 
detected, that should be enough.






+   int len = 0;

 for (idx = 0; idx < RTE_MAX_LCORE; idx++)
 cores[idx] = -1;
@@ -862,8 +867,10 @@ eal_parse_corelist(const char *corelist, int *cores)
 idx = strtol(corelist, &end, 10);
 if (errno || end == NULL)
 return -1;
-   if (idx < 0 || idx >= RTE_MAX_LCORE)
+   if (idx < 0)
 return -1;
+   if (idx >= RTE_MAX_LCORE)
+   overflow = true;

The code before was intermixing parsing and validation of values.
This intermix was not that great.
Let's separate those concerns.


I see what you mean (in your comments below). Agreed this would be a 
good idea.






 while (isblank(*end))
 end++;
 if (*end == '-') {
@@ -873,10 +880,19 @@ eal_parse_corelist(const char *corelist, int *cores)
 if (min == RTE_MAX_LCORE)
 min = idx;
 for (idx = min; idx <= max; idx++) {
-   if (cores[idx] == -1) {
-   cores[idx] = count;
-   count++;
+   if (idx < RTE_MAX_LCORE) {
+   if (cores[idx] == -1)
+   cores[idx] = count;
 }
+   count++;
+   if (count == 1)
+   len = len + snprintf(&lcores[len],
+   MAX_LCORES_STRING - len,
+   "%d@%d", count-1, idx);
+   else
+

Re: [dpdk-dev] [PATCH v8 2/2] app/testpmd: fix testpmd doesn't show RSS hash offload

2021-09-20 Thread Ferruh Yigit

On 9/18/2021 3:18 AM, Li, Xiaoyun wrote:
> Hi
> 
>> -Original Message-
>> From: Yigit, Ferruh 
>> Sent: Friday, September 17, 2021 18:20
>> To: Li, Xiaoyun ; Wang, Jie1X ;
>> dev@dpdk.org
>> Cc: andrew.rybche...@oktetlabs.ru; tho...@monjalon.net;
>> jer...@marvell.com; Ananyev, Konstantin 
>> Subject: Re: [PATCH v8 2/2] app/testpmd: fix testpmd doesn't show RSS hash
>> offload
>>
>> On 9/9/2021 4:31 AM, Li, Xiaoyun wrote:
>>> Hi
>>>
 -Original Message-
 From: Yigit, Ferruh 
 Sent: Thursday, September 9, 2021 00:51
 To: Wang, Jie1X ; dev@dpdk.org; Li, Xiaoyun
 
 Cc: andrew.rybche...@oktetlabs.ru; tho...@monjalon.net
 Subject: Re: [PATCH v8 2/2] app/testpmd: fix testpmd doesn't show RSS
 hash offload

 On 8/27/2021 9:17 AM, Jie Wang wrote:
> The driver may change offloads info into dev->data->dev_conf in
> dev_configure which may cause port->dev_conf and port->rx_conf
> contain outdated values.
>
> This patch updates the offloads info if it changes to fix this issue.
>
> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>
> Signed-off-by: Jie Wang 
> ---
>  app/test-pmd/testpmd.c | 34 ++
>  app/test-pmd/testpmd.h |  2 ++
>  app/test-pmd/util.c| 15 +++
>  3 files changed, 51 insertions(+)
>
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 6cbe9ba3c8..bd67291160 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2461,6 +2461,9 @@ start_port(portid_t pid)
>   }
>
>   if (port->need_reconfig > 0) {
> + struct rte_eth_conf dev_conf_info;
> + int k;
> +
>   port->need_reconfig = 0;
>
>   if (flow_isolate_all) {
> @@ -2498,6 +2501,37 @@ start_port(portid_t pid)
>   port->need_reconfig = 1;
>   return -1;
>   }
> + /* get rte_eth_conf info */
> + if (0 !=
> + eth_dev_conf_info_get_print_err(pi,
> + &dev_conf_info)) {
> + fprintf(stderr,
> + "port %d can not get device
 configuration info\n",
> + pi);
> + return -1;
> + }
> + /* Apply Rx offloads configuration */
> + if (dev_conf_info.rxmode.offloads !=
> + port->dev_conf.rxmode.offloads) {
> + port->dev_conf.rxmode.offloads =
> + dev_conf_info.rxmode.offloads;
> + for (k = 0;
> +  k < port->dev_info.max_rx_queues;
> +  k++)
> + port->rx_conf[k].offloads =
> +
dev_conf_info.rxmode.offloads;
> + }
> + /* Apply Tx offloads configuration */
> + if (dev_conf_info.txmode.offloads !=
> + port->dev_conf.txmode.offloads) {
> + port->dev_conf.txmode.offloads =
> + dev_conf_info.txmode.offloads;
> + for (k = 0;
> +  k < port->dev_info.max_tx_queues;
> +  k++)
> + port->tx_conf[k].offloads =
> +
dev_conf_info.txmode.offloads;
> + }
>   }

 Above implementation gets the configuration from device and applies
 it to the testpmd configuration.

 Instead, what about a long level target to get rid of testpmd
 specific copy of the configuration and rely and the config provided
 by devices. @Xiaoyun, what do you think, does this make sense?
>>>
>>> You mean remove port->dev_conf and rx/tx_conf completely in the future? Or
>> keep it in initial stage?
>>>
>>> Now, port->dev_conf will take global tx/rx_mode, fdir_conf and change some
>> based on dev_info capabilities. And then use dev_configure to apply them for
>> device.
>>> After this, actually, dev->data->dev_conf contains all device configuration.
>>>
>>> So It seems it's OK to remove port->dev_conf completely. Just testpmd needs
>> to be refactored a lot and regression test in case of issues.
>>> But from long term view, it's good to keep one source and avoid copy.
>>>
>>
>> Yes, this is the intention I have for long term. I expect that testpmd still 
>> will keep
>> some configuration in application level but we can prevent some duplication.
>>
>> And the main point is, by cleaning up testpmd we can recognize blocke

Re: [dpdk-dev] [PATCH v8 2/2] app/testpmd: fix testpmd doesn't show RSS hash offload

2021-09-20 Thread Ferruh Yigit

On 8/27/2021 9:17 AM, Jie Wang wrote:
> The driver may change offloads info into dev->data->dev_conf
> in dev_configure which may cause port->dev_conf and port->rx_conf
> contain outdated values.
> 
> This patch updates the offloads info if it changes to fix this issue.
> 
> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
> 
> Signed-off-by: Jie Wang 

<...>

> + /* Apply Rx offloads configuration */
> + if (dev_conf_info.rxmode.offloads !=
> + port->dev_conf.rxmode.offloads) {
> + port->dev_conf.rxmode.offloads =
> + dev_conf_info.rxmode.offloads;
> + for (k = 0;
> +  k < port->dev_info.max_rx_queues;
> +  k++)
> + port->rx_conf[k].offloads =
> + dev_conf_info.rxmode.offloads;

If queue specific offloads are used, won't this overwrite it with port offload?

Should we get queue config from device and update queue offloads with it?

Re: [dpdk-dev] [PATCH v3] Enable AddressSanitizer feature on DPDK

2021-09-20 Thread David Marchand

On Sat, Sep 18, 2021 at 9:51 AM  wrote:
>
> From: Zhihong Peng 

- The title is too vague.
I am not sure what the best title is, but my current idea is:
mem: instrument allocator with ASan


- This is a nice feature that must be announced in the release notes.


- How should we spell it?
Asan ?
ASAN ?
ASan ?

Please update devtools/words-case.txt and fix inconsistencies in this patch.

>
> AddressSanitizer (ASan) is a google memory error detect
> standard tool. It could help to detect use-after-free and
> {heap,stack,global}-buffer overflow bugs in C/C++ programs,
> print detailed error information when error happens, large
> improve debug efficiency.
>
> By referring to its implementation algorithm
> (https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
> enable heap-buffer-overflow and use-after-free functions on dpdk.
> DPDK ASAN function currently only supports on Linux x86_64.

If you don't intend to update other arches, at least explain in the
commitlog what should be done: so that other arches know what to do to
add support.


>
> Here is an example of heap-buffer-overflow bug:
> ..
> char *p = rte_zmalloc(NULL, 7, 0);
> p[7] = 'a';
> ..
>
> Here is an example of use-after-free bug:
> ..
> char *p = rte_zmalloc(NULL, 7, 0);
> rte_free(p);
> *p = 'a';
> ..
>
> If you want to use this feature,
> you need to add below compilation options when compiling code:
> -Dbuildtype=debug -Db_lundef=false -Db_sanitize=address

ASAN is triggered by -Db_sanitize=address, it is the only *needed* option afaiu.


> "-Dbuildtype=debug": Display code information when coredump occurs
> in the program.

In ASan context, there is no coredump.
ASan displays a backtrace which is easier to read when debug symbols
are available.
You can suggest building with debug, but this is *not needed*.


> "-Db_lundef=false": It is enabled by default, and needs to be
> disabled when using asan.

This is an issue with meson and clang.
Tweaking b_lundef is needed with clang, gcc looks fine.
But still, on RHEL with gcc, I need to install libasan.

Maybe we can add libasan at a requirement at project level, did you try it?


>
> Signed-off-by: Xueqin Lin 
> Signed-off-by: Zhihong Peng 
> ---
>  doc/guides/prog_guide/asan.rst  | 130 ++
>  doc/guides/prog_guide/index.rst |   1 +
>  lib/eal/common/malloc_elem.c|  26 -
>  lib/eal/common/malloc_elem.h| 184 +++-
>  lib/eal/common/malloc_heap.c|  12 +++
>  lib/eal/common/rte_malloc.c |   9 +-
>  lib/pipeline/rte_swx_pipeline.c |   4 +-

This change on pipeline has no explanation, and looks out of place wrt
to current change.



>  7 files changed, 359 insertions(+), 7 deletions(-)
>  create mode 100644 doc/guides/prog_guide/asan.rst
>
> diff --git a/doc/guides/prog_guide/asan.rst b/doc/guides/prog_guide/asan.rst
> new file mode 100644
> index 00..a0589d9b8a
> --- /dev/null
> +++ b/doc/guides/prog_guide/asan.rst
> @@ -0,0 +1,130 @@
> +.. Copyright (c) <2021>, Intel Corporation
> +   All rights reserved.
> +
> +Memory error detect standard tool - AddressSanitizer(Asan)
> +==
> +
> +AddressSanitizer (ASan) is a google memory error detect
> +standard tool. It could help to detect use-after-free and
> +{heap,stack,global}-buffer overflow bugs in C/C++ programs,
> +print detailed error information when error happens, large
> +improve debug efficiency.
> +
> +By referring to its implementation algorithm
> +(https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
> +enabled heap-buffer-overflow and use-after-free functions on dpdk.
> +DPDK ASAN function currently only supports on Linux x86_64.
> +
> +AddressSanitizer is a part of LLVM(3.1+)and GCC(4.8+).
> +
> +Example heap-buffer-overflow error
> +--
> +
> +Following error was reported when Asan was enabled::
> +
> +Applied 9 bytes of memory, but accessed the 10th byte of memory,
> +so heap-buffer-overflow appeared.
> +
> +Below code results in this error::
> +
> +char *p = rte_zmalloc(NULL, 9, 0);
> +if (!p) {
> +printf("rte_zmalloc error.");
> +return -1;
> +}
> +p[9] = 'a';
> +
> +The error log::
> +
> +==49433==ERROR: AddressSanitizer: heap-buffer-overflow on address 
> 0x7f773fafa249 at pc 0x5556b13bdae4 bp 0x7ffeb4965e40 sp 0x7ffeb4965e30 WRITE 
> of size 1 at 0x7f773fafa249 thread T0
> +#0 0x5556b13bdae3 in asan_heap_buffer_overflow 
> ../app/test/test_asan_heap_buffer_overflow.c:25
> +#1 0x5556b043e9d4 in cmd_autotest_parsed ../app/test/commands.c:71
> +#2 0x5556b1cdd4b0 in cmdline_parse ../lib/cmdline/cmdline_parse.c:290
> +#3 0x5556b1cd8987 in cmdline_valid_buffer ../lib/cmdline/cmdline.c:26
> +#4 0x5556b1ce477a in rdline_char_in ../lib/cmdline/cmdline_rdline.c:421
> +#5 0x5556b1cd923e in cmdline_in ..

Re: [dpdk-dev] [PATCH v5 01/16] raw/ioat: only build if dmadev not present

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:22PM +, Kevin Laatz wrote:
> From: Bruce Richardson 
> 
> Only build the rawdev IDXD/IOAT drivers if the dmadev drivers are not
> present.
> 
> A not is also added to the documentation to inform users of this change.

typo: "note"

It would also be worthwhile mentioning in the commit log that the order of
dependencies is changed so that dmadev comes before rawdev.

> 
> Signed-off-by: Bruce Richardson 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> 
> ---
> v4:
>   - Fix build issue
>   - Add note in raw documentation to outline this change
> ---
>  doc/guides/rawdevs/ioat.rst  |  7 +++
>  drivers/meson.build  |  2 +-
>  drivers/raw/ioat/meson.build | 23 ---
>  3 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/guides/rawdevs/ioat.rst b/doc/guides/rawdevs/ioat.rst
> index a28e909935..4fc327f1a4 100644
> --- a/doc/guides/rawdevs/ioat.rst
> +++ b/doc/guides/rawdevs/ioat.rst
> @@ -34,6 +34,13 @@ Compilation
>  For builds using ``meson`` and ``ninja``, the driver will be built when the 
> target platform is x86-based.
>  No additional compilation steps are necessary.
>  
> +.. note::
> +Since the addition of the DMAdev library, the ``ioat`` and ``idxd`` 
> parts of this driver
> +will only be built if their ``DMAdev`` counterparts are not built. 
> The following can be used
> +to disable the ``DMAdev`` drivers, if the raw drivers are to be used 
> instead::
> +

Suggest where possible to split lines on punctuation. Put a line break
after the "." at the end of the first sentence. Similarly if breaking
lines, try and do so after commas.

> +$ meson -Ddisable_drivers=dma/* 
> +
>  Device Setup
>  -
>  
> diff --git a/drivers/meson.build b/drivers/meson.build
> index b7d680868a..27ff10a9fc 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -10,6 +10,7 @@ subdirs = [
>  'common/qat', # depends on bus.
>  'common/sfc_efx', # depends on bus.
>  'mempool',# depends on common and bus.
> +'dma',# depends on common and bus.
>  'net',# depends on common, bus, mempool
>  'raw',# depends on common, bus and net.
>  'crypto', # depends on common, bus and mempool (net in 
> future).
> @@ -18,7 +19,6 @@ subdirs = [
>  'vdpa',   # depends on common, bus and mempool.
>  'event',  # depends on common, bus, mempool and net.
>  'baseband',   # depends on common and bus.
> -'dma',# depends on common and bus.
>  ]
>  

As stated above, I think the reason for this change should be noted in the
commit log.

>  if meson.is_cross_build()
> diff --git a/drivers/raw/ioat/meson.build b/drivers/raw/ioat/meson.build
> index 0e81cb5951..9be9d8cc65 100644
> --- a/drivers/raw/ioat/meson.build
> +++ b/drivers/raw/ioat/meson.build
> @@ -2,14 +2,31 @@
>  # Copyright 2019 Intel Corporation
>  
>  build = dpdk_conf.has('RTE_ARCH_X86')
> +# only use ioat rawdev driver if we don't have the equivalent dmadev ones
> +if dpdk_conf.has('RTE_DMA_IDXD') and dpdk_conf.has('RTE_DMA_IOAT')
> +build = false
> +subdir_done()
> +endif
> +
>  reason = 'only supported on x86'
>  sources = files(
> -'idxd_bus.c',
> -'idxd_pci.c',
>  'ioat_common.c',
> -'ioat_rawdev.c',
>  'ioat_rawdev_test.c',
>  )
> +
> +if not dpdk_conf.has('RTE_DMA_IDXD')
> +sources += files(
> +'idxd_bus.c',
> +'idxd_pci.c',
> +)
> +endif
> +
> +if not dpdk_conf.has('RTE_DMA_IOAT')
> +sources += files (
> +'ioat_rawdev.c',
> +)
> +endif
> +
>  deps += ['bus_pci', 'mbuf', 'rawdev']
>  headers = files(
>  'rte_ioat_rawdev.h',
> -- 
> 2.30.2
>

Re: [dpdk-dev] [PATCH v5 02/16] dma/idxd: add skeleton for VFIO based DSA device

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:23PM +, Kevin Laatz wrote:
> Add the basic device probe/remove skeleton code for DSA device bound to
> the vfio pci driver. Relevant documentation and MAINTAINERS update also
> included.
> 
> Signed-off-by: Bruce Richardson 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> 



> --- /dev/null
> +++ b/drivers/dma/idxd/meson.build
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 Intel Corporation
> +
> +if is_windows
> +subdir_done()
> +endif
> +
> +deps += ['bus_pci']
> +sources = files(
> +'idxd_pci.c'
> +)
> \ No newline at end of file

If doing a v6, this should be fixed to have a newline at end.

/Bruce

Re: [dpdk-dev] [PATCH v5 07/16] dma/idxd: add configure and info_get functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:28PM +, Kevin Laatz wrote:
> Add functions for device configuration. The info_get function is included
> here since it can be useful for checking successful configuration.
>
Since this patch makes a change in the test code to enable use in the docs,
that should be called out here too, I think, for example: "When providing
an example of the function's use in the documentation, use code snippet
from the unit tests".

Re: [dpdk-dev] [PATCH v5 09/16] dma/idxd: add data-path job submission functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:30PM +, Kevin Laatz wrote:
> Add data path functions for enqueuing and submitting operations to DSA
> devices.
> 
> Signed-off-by: Bruce Richardson 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> ---
>  doc/guides/dmadevs/idxd.rst  |  64 +++
>  drivers/dma/idxd/idxd_common.c   | 136 +++
>  drivers/dma/idxd/idxd_internal.h |   5 ++
>  drivers/dma/idxd/meson.build |   1 +
>  4 files changed, 206 insertions(+)
> 
> diff --git a/doc/guides/dmadevs/idxd.rst b/doc/guides/dmadevs/idxd.rst
> index a603c5dd22..7835461a22 100644
> --- a/doc/guides/dmadevs/idxd.rst
> +++ b/doc/guides/dmadevs/idxd.rst
> @@ -153,3 +153,67 @@ The following code shows how the device is configured in
>  
>  Once configured, the device can then be made ready for use by calling the
>  ``rte_dma_start()`` API.
> +
> +Performing Data Copies
> +~~~
> +
> +To perform data copies using IDXD dmadev devices, descriptors should be 
> enqueued
> +using the ``rte_dma_copy()`` API. The HW can be triggered to perform the copy
> +in two ways, either via a ``RTE_DMA_OP_FLAG_SUBMIT`` flag or by calling
> +``rte_dma_submit()``. Once copies have been completed, the completion will
> +be reported back when the application calls ``rte_dma_completed()`` or
> +``rte_dma_completed_status()``. The latter will also report the status of 
> each
> +completed operation.
> +
> +The ``rte_dma_copy()`` function enqueues a single copy to the device ring for
> +copying at a later point. The parameters to that function include the IOVA 
> addresses
> +of both the source and destination buffers, as well as the length of the 
> copy.
> +
> +The ``rte_dma_copy()`` function enqueues a copy operation on the device ring.
> +If the ``RTE_DMA_OP_FLAG_SUBMIT`` flag is set when calling 
> ``rte_dma_copy()``,
> +the device hardware will be informed of the elements. Alternatively, if the 
> flag
> +is not set, the application needs to call the ``rte_dma_submit()`` function 
> to
> +notify the device hardware. Once the device hardware is informed of the 
> elements
> +enqueued on the ring, the device will begin to process them. It is expected
> +that, for efficiency reasons, a burst of operations will be enqueued to the
> +device via multiple enqueue calls between calls to the ``rte_dma_submit()``
> +function.
> +
> +The following code demonstrates how to enqueue a burst of copies to the
> +device and start the hardware processing of them:
> +
> +.. code-block:: C
> +
> +   struct rte_mbuf *srcs[COMP_BURST_SZ], *dsts[COMP_BURST_SZ];
> +   unsigned int i;
> +
> +   for (i = 0; i < RTE_DIM(srcs); i++) {
> +  uint64_t *src_data;
> +
> +  srcs[i] = rte_pktmbuf_alloc(pool);
> +  dsts[i] = rte_pktmbuf_alloc(pool);
> +  src_data = rte_pktmbuf_mtod(srcs[i], uint64_t *);
> +  if (srcs[i] == NULL || dsts[i] == NULL) {
> + PRINT_ERR("Error allocating buffers\n");
> + return -1;
> +  }
> +
> +  for (j = 0; j < COPY_LEN/sizeof(uint64_t); j++)
> + src_data[j] = rte_rand();
> +
> +  if (rte_dma_copy(dev_id, vchan, srcs[i]->buf_iova + srcs[i]->data_off,
> +dsts[i]->buf_iova + dsts[i]->data_off, COPY_LEN, 0) < 0) {
> + PRINT_ERR("Error with rte_dma_copy for buffer %u\n", i);
> + return -1;
> +  }
> +   }
> +   rte_dma_submit(dev_id, vchan);
> +

I think this code block is larger than necessary, because it shows buffer
allocation and initialization rather than just the basics of copy() and
submit() APIs. Furthermore, rather than calling out the generic API use in
the idxd-specific docs, can we just include a reference to the dmadev
documentation?

/Bruce

Re: [dpdk-dev] [EXT] Re: [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ferruh Yigit

On 9/20/2021 10:19 AM, Ashwin Sekhar Thalakalath Kottilveetil wrote:
>>> Oh yes, I missed it, sorry.
>>>
> One more question: why is useful to add? Some people forget
>> uppercases?

 Upper case is desired but not really mandatory. This was a suggestion
 put forth to me In one of the reviews.
 https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__patches.dpdk.org
 _project_dpdk_patch_20210830135231.2610152-2D1-2Dasekhar-
>> 40marvell.co
 m_&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=pYk-QOhvnkU-
>> _75y0NKSn535ZotEGI
 _E69Py3Ppondk&m=tCLT4AyWr6-
>> VYmqkdbD879kj0uDFhCqF6jjOWfe8Dn4&s=EWthslG
 Cy_OWH4bqcOEKKkweFTe4yHZ-2O5yqiKp39w&e=

 I can abandon this change if you feel it is not appropriate to put
 many device specific acronyms in the top level word list.
>>>
>>> No strong opinion, but I think the patch is OK.
>>> David, Ferruh, opinions?
>>>
>>
>> Yes this is suggested to be sure acronyms are uppercase in the patch title.
>>
>> But if an issue can be described in generic concepts, I am for using them to
>> instead of using device specific acronyms, to make commit logs less cryptic.
> Agree that certain commit logs could be re-worded and put in more
> generic terms.
> 
>> Like 'NIC' should be used instead of 'NIX'.
> But NIX and NIC are not exactly interchangeable. NIX refers to a co-processor.
> NIX in conjunction with other co-processors (NPA, LMT etc.) delivers the
> functionality of a NIC.
> 

Got it, my bad, I understood they are same. OK to keep it.

>>
>> Similarly we can try to use long version of CQ/SQ/RQ, although we may need
>> to use them because of limited title length time to time.
>>
>> Rest seems device specific abbreviations we may not escape to use them, so
>> they are OK to me.
>

Re: [dpdk-dev] [PATCH v3] devtools: add acronyms in dictionary for commit checks

2021-09-20 Thread Ferruh Yigit

On 9/20/2021 10:10 AM, Jerin Jacob wrote:
> On Mon, Sep 20, 2021 at 2:37 PM Ferruh Yigit  wrote:
>>
>> On 9/17/2021 11:58 AM, Ashwin Sekhar T K wrote:
>>> Update word list with Marvell specific acronyms.
>>>
>>> CPT  -> Cryptographic Accelerator Unit
>>> CQ   -> Completion Queue
>>> LBK  -> Loopback Interface Unit
>>> LMT  -> Large Atomic Store Unit
>>> MCAM -> Match Content Addressable Memory
>>> NIX  -> Network Interface Controller Unit
>>> NPA  -> Network Pool Allocator
>>> NPC  -> Network Parser and CAM Unit
>>> ROC  -> Rest Of Chip
>>
>> Out of curiosity, what is "rest of chip"?
> 
> All the HW accelerators excluding the CPU cores.
> 

Thanks.

>>
>>> RQ   -> Receive Queue
>>> RVU  -> Resource Virtualization Unit
>>> SQ   -> Send Queue
>>> SSO  -> Schedule Synchronize Order Unit
>>> TIM  -> Timer Unit
>>>
>>> Signed-off-by: Ashwin Sekhar T K 
>>

Re: [dpdk-dev] [PATCH v5 10/16] dma/idxd: add data-path job completion functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:31PM +, Kevin Laatz wrote:
> Add the data path functions for gathering completed operations.
> 
> Signed-off-by: Bruce Richardson 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> 
> ---
> v2:
>- fixed typo in docs
>- add completion status for invalid opcode
> ---
>  doc/guides/dmadevs/idxd.rst  |  32 +
>  drivers/dma/idxd/idxd_common.c   | 235 +++
>  drivers/dma/idxd/idxd_internal.h |   5 +
>  3 files changed, 272 insertions(+)
> 
> diff --git a/doc/guides/dmadevs/idxd.rst b/doc/guides/dmadevs/idxd.rst
> index 7835461a22..f942a8aa44 100644
> --- a/doc/guides/dmadevs/idxd.rst
> +++ b/doc/guides/dmadevs/idxd.rst
> @@ -209,6 +209,38 @@ device and start the hardware processing of them:
> }
> rte_dma_submit(dev_id, vchan);
>  
> +To retrieve information about completed copies, ``rte_dma_completed()`` and
> +``rte_dma_completed_status()`` APIs should be used. ``rte_dma_completed()``
> +will return the number of completed operations, along with the index of the 
> last
> +successful completed operation and whether or not an error was encountered. 
> If an
> +error was encountered, ``rte_dma_completed_status()`` must be used to kick 
> the
> +device off to continue processing operations and also to gather the status 
> of each
> +individual operations which is filled in to the ``status`` array provided as
> +parameter by the application.
> +
> +The following status codes are supported by IDXD:
> +* ``RTE_DMA_STATUS_SUCCESSFUL``: The operation was successful.
> +* ``RTE_DMA_STATUS_INVALID_OPCODE``: The operation failed due to an invalid 
> operation code.
> +* ``RTE_DMA_STATUS_INVALID_LENGTH``: The operation failed due to an invalid 
> data length.
> +* ``RTE_DMA_STATUS_NOT_ATTEMPTED``: The operation was not attempted.
> +* ``RTE_DMA_STATUS_ERROR_UNKNOWN``: The operation failed due to an 
> unspecified error.
> +
> +The following code shows how to retrieve the number of successfully completed
> +copies within a burst and then using ``rte_dma_completed_status()`` to check
> +which operation failed and kick off the device to continue processing 
> operations:
> +
> +.. code-block:: C
> +
> +   enum rte_dma_status_code status[COMP_BURST_SZ];
> +   uint16_t count, idx, status_count;
> +   bool error = 0;
> +
> +   count = rte_dma_completed(dev_id, vchan, COMP_BURST_SZ, &idx, &error);
> +
> +   if (error){
> +  status_count = rte_dma_completed_status(dev_id, vchan, COMP_BURST_SZ, 
> &idx, status);
> +   }
> +
As with some of the other documentation text, it should be checked for
overlap with the dmadev documentation, and merged with that if appropriate.

/Bruce

Re: [dpdk-dev] [PATCH v5 13/16] dma/idxd: add burst capacity API

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:34PM +, Kevin Laatz wrote:
> Add support for the burst capacity API. This API will provide the calling
> application with the remaining capacity of the current burst (limited by
> max HW batch size).
> 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> ---
Reviewed-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v5 14/16] dma/idxd: move dpdk_idxd_cfg.py from raw to dma

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:35PM +, Kevin Laatz wrote:
> From: Conor Walsh 
> 
> Move the example script for configuring IDXD devices bound to the IDXD
> kernel driver from raw to dma, and create a symlink to still allow use from
> raw.
> 
> Signed-off-by: Conor Walsh 
> Signed-off-by: Kevin Laatz 
> ---
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v5 15/16] devbind: add dma device class

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:36PM +, Kevin Laatz wrote:
> Add a new class for DMA devices. Devices listed under the DMA class are to
> be used with the dmadev library.
> 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> ---

One small comment below to be fixed.

Reviewed-by: Bruce Richardson 

>  usertools/dpdk-devbind.py | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
> index 74d16e4c4b..8bb573f4b0 100755
> --- a/usertools/dpdk-devbind.py
> +++ b/usertools/dpdk-devbind.py
> @@ -69,12 +69,13 @@
>  network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
>  baseband_devices = [acceleration_class]
>  crypto_devices = [encryption_class, intel_processor_class]
> +dma_devices = []
>  eventdev_devices = [cavium_sso, cavium_tim, intel_dlb, octeontx2_sso]
>  mempool_devices = [cavium_fpa, octeontx2_npa]
>  compress_devices = [cavium_zip]
>  regex_devices = [octeontx2_ree]
> -misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx, 
> intel_ioat_icx, intel_idxd_spr,
> -intel_ntb_skx, intel_ntb_icx,
> +misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx,
> +intel_ioat_icx, intel_idxd_spr, intel_ntb_skx, intel_ntb_icx,

This looks a purely cosmetic change, which doesn't really below in the
patch - especially since a number of these entries are to move in later
patches for 21.11.

/Bruce

Re: [dpdk-dev] [PATCH v5 16/16] devbind: move idxd device ID to dmadev class

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:24:37PM +, Kevin Laatz wrote:
> The dmadev library is the preferred abstraction for using IDXD devices and
> will replace the rawdev implementation in future. This patch moves the IDXD
> device ID to the dmadev class.
> 
> Signed-off-by: Kevin Laatz 
> Reviewed-by: Conor Walsh 
> ---
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH] cmdline: reduce ABI

2021-09-20 Thread David Marchand

On Sat, Sep 11, 2021 at 1:17 AM Dmitry Kozlyuk  wrote:
>
> Remove the definition of `struct cmdline` from public header.
> Deprecation notice:
> https://mails.dpdk.org/archives/dev/2020-September/183310.html
>
> Signed-off-by: Dmitry Kozlyuk 

This patch lgtm.
Acked-by: David Marchand 

> ---
> I would also hide struct rdline to be able to alter buffer size,
> but we don't have a deprecation notice for it.

Fyi, I found one project looking into a rdline pointer to get the back
reference to cmdline stored in opaque.
https://github.com/Gandi/packet-journey/blob/master/app/cmdline.c#L1398

This cmdline pointer is then dereferenced to get s_out.
Given that we announced cmdline becoming opaque, they would have to
handle the first API change in any case.
I don't think another API change would really make a big difference to them.

Plus, this project seems stuck to 18.08 support.

--
David Marchand

Re: [dpdk-dev] [PATCH v4 01/11] dma/ioat: add device probe and removal functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:17PM +, Conor Walsh wrote:
> Add the basic device probe/remove skeleton code and initial documentation
> for new IOAT DMA driver. Maintainers update is also included in this
> patch.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
>  MAINTAINERS|  6 +++
>  doc/guides/dmadevs/index.rst   |  2 +
>  doc/guides/dmadevs/ioat.rst| 64 
>  doc/guides/rel_notes/release_21_11.rst |  7 +--
>  drivers/dma/ioat/ioat_dmadev.c | 69 ++
>  drivers/dma/ioat/ioat_hw_defs.h| 35 +
>  drivers/dma/ioat/ioat_internal.h   | 20 
>  drivers/dma/ioat/meson.build   |  7 +++
>  drivers/dma/ioat/version.map   |  3 ++
>  drivers/dma/meson.build|  1 +
>  10 files changed, 211 insertions(+), 3 deletions(-)
>  create mode 100644 doc/guides/dmadevs/ioat.rst
>  create mode 100644 drivers/dma/ioat/ioat_dmadev.c
>  create mode 100644 drivers/dma/ioat/ioat_hw_defs.h
>  create mode 100644 drivers/dma/ioat/ioat_internal.h
>  create mode 100644 drivers/dma/ioat/meson.build
>  create mode 100644 drivers/dma/ioat/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9cb59b831d..70993d23e8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1209,6 +1209,12 @@ M: Kevin Laatz 
>  F: drivers/dma/idxd/
>  F: doc/guides/dmadevs/idxd.rst
>  
> +Intel IOAT - EXPERIMENTAL
> +M: Bruce Richardson 
> +M: Conor Walsh 
> +F: drivers/dma/ioat/
> +F: doc/guides/dmadevs/ioat.rst
> +
>  

Unlike the raw/ioat driver, this dmadev driver does not have a private APIs
so I'm not sure it needs the EXPERIMENTAL tag on it.

>  RegEx Drivers
>  -
> diff --git a/doc/guides/dmadevs/index.rst b/doc/guides/dmadevs/index.rst
> index 5d4abf880e..c59f4b5c92 100644
> --- a/doc/guides/dmadevs/index.rst
> +++ b/doc/guides/dmadevs/index.rst
> @@ -12,3 +12,5 @@ an application through DMA API.
> :numbered:
>  
> idxd
> +   ioat
> +
> diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
> new file mode 100644
> index 00..45a2e65d70
> --- /dev/null
> +++ b/doc/guides/dmadevs/ioat.rst
> @@ -0,0 +1,64 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +Copyright(c) 2021 Intel Corporation.
> +
> +.. include:: 
> +
> +IOAT DMA Device Driver
> +===
> +
> +The ``ioat`` dmadev driver provides a poll-mode driver (PMD) for Intel\
> +|reg| QuickData Technology which is part of part of Intel\ |reg| I/O
> +Acceleration Technology (`Intel I/OAT
> +`_).
> +This PMD, when used on supported hardware, allows data copies, for example,
> +cloning packet data, to be accelerated by IOAT hardware rather than having to
> +be done by software, freeing up CPU cycles for other tasks.
> +
> +Hardware Requirements
> +--
> +
> +The ``dpdk-devbind.py`` script, included with DPDK, can be used to show the
> +presence of supported hardware. Running ``dpdk-devbind.py --status-dev dma``
> +will show all the DMA devices on the system, IOAT devices are included in 
> this
> +list. For Intel\ |reg| IOAT devices, the hardware will often be listed as
> +"Crystal Beach DMA", or "CBDMA" or on some newer systems '0b00' due to the
> +absence of pci-id database entries for them at this point.
> +
> +Compilation
> +
> +
> +For builds using ``meson`` and ``ninja``, the driver will be built when the
> +target platform is x86-based. No additional compilation steps are necessary.
> +
> +Device Setup
> +-
> +
> +Intel\ |reg| IOAT devices will need to be bound to a suitable DPDK-supported
> +user-space IO driver such as ``vfio-pci`` in order to be used by DPDK.
> +
> +The ``dpdk-devbind.py`` script can be used to view the state of the devices 
> using::
> +
> +   $ dpdk-devbind.py --status-dev dma
> +
> +The ``dpdk-devbind.py`` script can also be used to bind devices to a 
> suitable driver.
> +For example::
> +
> + $ dpdk-devbind.py -b vfio-pci 00:01.0 00:01.1
> +
> +Device Probing and Initialization
> +~~
> +
> +For devices bound to a suitable DPDK-supported driver (``vfio-pci``), the HW
> +devices will be found as part of the device scan done at application
> +initialization time without the need to pass parameters to the application.
> +
> +If the application does not require all the devices available an allowlist 
> can
> +be used in the same way that other DPDK devices use them.
> +
> +For example::
> +
> + $ dpdk-test -a 
> +
> +Once probed successfully, the device will appear as a ``dmadev``, that is a
> +"DMA device type" inside DPDK, and can be accessed using APIs from the
> +``rte_dmadev`` library.
> diff --git a/doc/guides/rel_notes/release_21_11.rst 
> b/doc/guides/rel_notes/release_21_11.rst
> index c0bfd9c1ba..4d2b7bde1b 100644
> --- a/doc/guides/rel_notes/release_21_11.r

Re: [dpdk-dev] [PATCH] cmdline: reduce ABI

2021-09-20 Thread Olivier Matz

Hi Dmitry,

On Mon, Sep 20, 2021 at 01:11:23PM +0200, David Marchand wrote:
> On Sat, Sep 11, 2021 at 1:17 AM Dmitry Kozlyuk  
> wrote:
> >
> > Remove the definition of `struct cmdline` from public header.
> > Deprecation notice:
> > https://mails.dpdk.org/archives/dev/2020-September/183310.html
> >
> > Signed-off-by: Dmitry Kozlyuk 
> 
> This patch lgtm.
> Acked-by: David Marchand 

Acked-by: Olivier Matz 

Many thanks Dmitry for taking care of this.


> > ---
> > I would also hide struct rdline to be able to alter buffer size,
> > but we don't have a deprecation notice for it.
> 
> Fyi, I found one project looking into a rdline pointer to get the back
> reference to cmdline stored in opaque.
> https://github.com/Gandi/packet-journey/blob/master/app/cmdline.c#L1398
> 
> This cmdline pointer is then dereferenced to get s_out.
> Given that we announced cmdline becoming opaque, they would have to
> handle the first API change in any case.
> I don't think another API change would really make a big difference to them.
> 
> Plus, this project seems stuck to 18.08 support.

I agree with you and David, it would make sense to also hide the rdline
struct at the same time.


Olivier

Re: [dpdk-dev] [PATCH v2 1/6] examples/ioat: always use same lcore for both DMA requests enqueue and dequeue

2021-09-20 Thread Conor Walsh





From: Konstantin Ananyev 

Few changes in ioat sample behaviour:
- Always do SW copy for packet metadata (mbuf fields)
- Always use same lcore for both DMA requests enqueue and dequeue

Main reasons for that:
a) it is safer, as idxd PMD doesn't support MT safe enqueue/dequeue (yet).
b) sort of more apples to apples comparison with sw copy.
c) from my testing things are faster that way.

Signed-off-by: Konstantin Ananyev 
---
  examples/ioat/ioatfwd.c | 185 ++--
  1 file changed, 101 insertions(+), 84 deletions(-)

diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index b3977a8be5..1498343492 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -331,43 +331,36 @@ update_mac_addrs(struct rte_mbuf *m, uint32_t dest_portid)
  
  /* Perform packet copy there is a user-defined function. 8< */

  static inline void
-pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst)
+pktmbuf_metadata_copy(const struct rte_mbuf *src, struct rte_mbuf *dst)
  {
-   /* Copy packet metadata */
-   rte_memcpy(&dst->rearm_data,
-   &src->rearm_data,
-   offsetof(struct rte_mbuf, cacheline1)
-   - offsetof(struct rte_mbuf, rearm_data));
+   dst->data_off = src->data_off;
+   memcpy(&dst->rx_descriptor_fields1, &src->rx_descriptor_fields1,
+   offsetof(struct rte_mbuf, buf_len) -
+   offsetof(struct rte_mbuf, rx_descriptor_fields1));
+}
  
-	/* Copy packet data */

+/* Copy packet data */
+static inline void
+pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst)
+{
rte_memcpy(rte_pktmbuf_mtod(dst, char *),
rte_pktmbuf_mtod(src, char *), src->data_len);
  }
  /* >8 End of perform packet copy there is a user-defined function. */


Might need to redo these snippet markers as the function is now split in 
two.





+static inline uint32_t
+ioat_dequeue(struct rte_mbuf *src[], struct rte_mbuf *dst[], uint32_t num,
+   uint16_t dev_id)
+{
+   int32_t rc;


rc should be uint32_t, but this is removed in patch 4 of this set during 
the change from raw to dma so it shouldn't really matter.



+   /* Dequeue the mbufs from IOAT device. Since all memory
+* is DPDK pinned memory and therefore all addresses should
+* be valid, we don't check for copy errors
+*/
+   rc = rte_ioat_completed_ops(dev_id, num, NULL, NULL,
+   (void *)src, (void *)dst);
+   if (rc < 0) {
+   RTE_LOG(CRIT, IOAT,
+   "rte_ioat_completed_ops(%hu) failedi, error: %d\n",
+   dev_id, rte_errno);
+   rc = 0;
+   }
+   return rc;
+}


Reviewed-by: Conor Walsh

Re: [dpdk-dev] [PATCH v2 2/6] examples/ioat: add cmd-line option to control DMA batch size

2021-09-20 Thread Conor Walsh





From: Konstantin Ananyev 

Add a commandline options to control the HW copy batch size in the
application.

Signed-off-by: Konstantin Ananyev 
---


Reviewed-by: Conor Walsh

Re: [dpdk-dev] [PATCH v2 3/6] examples/ioat: add cmd line option to control max frame size

2021-09-20 Thread Conor Walsh





From: Konstantin Ananyev 

Add command line option for setting the max frame size.

Signed-off-by: Konstantin Ananyev 
---


Reviewed-by: Conor Walsh

Re: [dpdk-dev] [PATCH v2 4/6] examples/ioat: port application to dmadev APIs

2021-09-20 Thread Conor Walsh





The dmadev library abstraction allows applications to use the same APIs for
all DMA device drivers in DPDK. This patch updates the ioatfwd application
to make use of the new dmadev APIs, in turn making it a generic application
which can be used with any of the DMA device drivers.

Signed-off-by: Kevin Laatz 

---


Reviewed-by: Conor Walsh

Re: [dpdk-dev] [PATCH v2 5/6] examples/ioat: update naming to match change to dmadev

2021-09-20 Thread Conor Walsh





Existing functions, structures, defines etc need to be updated to reflect
the change to using the dmadev APIs.

Signed-off-by: Kevin Laatz 
---


Reviewed-by: Conor Walsh

Re: [dpdk-dev] [PATCH v2 6/6] examples/ioat: rename application to dmafwd

2021-09-20 Thread Conor Walsh





Since the APIs have been updated from rawdev to dmadev, the application
should also be renamed to match. This patch also includes the documentation
updates for the renaming.

Signed-off-by: Kevin Laatz 
---





-The initialization of hardware device is done by ``rte_rawdev_configure()``
-function using ``rte_rawdev_info`` struct. After configuration the device is
-started using ``rte_rawdev_start()`` function. Each of the above operations
-is done in ``configure_rawdev_queue()``.
+The initialization of hardware device is done by ``rte_dmadev_configure()`` and
+``rte_dmadev_vchan_setup()`` functions using the ``rte_dmadev_conf`` and
+``rte_dmadev_vchan_conf`` structs. After configuration the device is started
+using ``rte_dmadev_start()`` function. Each of the above operations is done in
+``configure_dmadev_queue()``.


These function names need to be updated for dmadev v22.




  The packets are received in burst mode using ``rte_eth_rx_burst()``
  function. When using hardware copy mode the packets are enqueued in
-copying device's buffer using ``ioat_enqueue_packets()`` which calls
-``rte_ioat_enqueue_copy()``. When all received packets are in the
-buffer the copy operations are started by calling ``rte_ioat_perform_ops()``.
-Function ``rte_ioat_enqueue_copy()`` operates on physical address of
+copying device's buffer using ``dma_enqueue_packets()`` which calls
+``rte_dmadev_copy()``. When all received packets are in the
+buffer the copy operations are started by calling ``rte_dmadev_submit()``.
+Function ``rte_dmadev_copy()`` operates on physical address of
  the packet. Structure ``rte_mbuf`` contains only physical address to
  start of the data buffer (``buf_iova``). Thus the address is adjusted
  by ``addr_offset`` value in order to get the address of ``rearm_data``


These function names need to be updated for dmadev v22.

Reviewed-by: Conor Walsh

[dpdk-dev] [PATCH v4 00/13] enhancements to host based flow table management

2021-09-20 Thread Venkat Duvvuru

This patch set adds support for new offload features/enhancments for
Thor adapters like VF representor support, new flow matches/actions
& dynamic SRAM manager support.

Farah Smith (4):
  net/bnxt: updates to TF core index table
  net/bnxt: add SRAM manager model
  net/bnxt: change log level to debug
  net/bnxt: add SRAM manager shared session

Jay Ding (1):
  net/bnxt: add flow meter drop counter support

Kishore Padmanabha (6):
  net/bnxt: add flow template support for Thor
  net/bnxt: add support for tunnel offload API
  net/bnxt: add support for dynamic encap action
  net/bnxt: add wild card TCAM byte order for Thor
  net/bnxt: add flow templates for Thor
  net/bnxt: add enhancements to TF ULP

Peter Spreadborough (1):
  net/bnxt: enable dpool allocator

Randy Schacher (1):
  net/bnxt: dynamically allocate space for EM defrag function

 doc/guides/rel_notes/release_21_11.rst| 6 +
 drivers/net/bnxt/tf_core/cfa_resource_types.h | 5 +-
 drivers/net/bnxt/tf_core/dpool.c  |38 +-
 drivers/net/bnxt/tf_core/ll.c | 3 +
 drivers/net/bnxt/tf_core/ll.h |50 +-
 drivers/net/bnxt/tf_core/meson.build  | 2 +
 drivers/net/bnxt/tf_core/tf_core.c|   169 +-
 drivers/net/bnxt/tf_core/tf_core.h|   159 +-
 drivers/net/bnxt/tf_core/tf_device.c  |40 +-
 drivers/net/bnxt/tf_core/tf_device.h  |   137 +-
 drivers/net/bnxt/tf_core/tf_device_p4.c   |77 +-
 drivers/net/bnxt/tf_core/tf_device_p4.h   |50 +-
 drivers/net/bnxt/tf_core/tf_device_p58.c  |   112 +-
 drivers/net/bnxt/tf_core/tf_device_p58.h  |70 +-
 drivers/net/bnxt/tf_core/tf_em.h  |10 -
 drivers/net/bnxt/tf_core/tf_em_common.c   | 4 +
 .../net/bnxt/tf_core/tf_em_hash_internal.c|34 -
 drivers/net/bnxt/tf_core/tf_em_internal.c |   185 +-
 drivers/net/bnxt/tf_core/tf_msg.c | 2 +-
 drivers/net/bnxt/tf_core/tf_rm.c  |   180 +-
 drivers/net/bnxt/tf_core/tf_rm.h  |62 +-
 drivers/net/bnxt/tf_core/tf_session.c |56 +
 drivers/net/bnxt/tf_core/tf_session.h |58 +-
 drivers/net/bnxt/tf_core/tf_sram_mgr.c|   971 +
 drivers/net/bnxt/tf_core/tf_sram_mgr.h|   317 +
 drivers/net/bnxt/tf_core/tf_tbl.c |   259 +-
 drivers/net/bnxt/tf_core/tf_tbl.h |87 +-
 drivers/net/bnxt/tf_core/tf_tbl_sram.c|   747 +
 drivers/net/bnxt/tf_core/tf_tbl_sram.h|   154 +
 drivers/net/bnxt/tf_core/tf_tcam.c|16 +-
 drivers/net/bnxt/tf_core/tf_tcam.h| 7 +
 drivers/net/bnxt/tf_core/tf_tcam_shared.c |28 +-
 drivers/net/bnxt/tf_core/tf_util.c|12 +
 drivers/net/bnxt/tf_ulp/bnxt_tf_common.h  |10 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c|52 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h|20 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   |   226 +-
 .../bnxt/tf_ulp/generic_templates/meson.build | 3 +
 .../generic_templates/ulp_template_db_act.c   | 2 +-
 .../generic_templates/ulp_template_db_class.c | 12109 +++-
 .../generic_templates/ulp_template_db_enum.h  |   618 +-
 .../generic_templates/ulp_template_db_field.h |   767 +-
 .../generic_templates/ulp_template_db_tbl.c   |  2757 +-
 .../ulp_template_db_thor_act.c|  5079 +-
 .../ulp_template_db_thor_class.c  | 45573 ++--
 .../ulp_template_db_wh_plus_act.c |  1700 +-
 .../ulp_template_db_wh_plus_class.c   |  8329 ++-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c  |48 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.h  | 8 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c |   678 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |68 +-
 drivers/net/bnxt/tf_ulp/ulp_gen_tbl.c | 9 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  |   448 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.h  |10 +-
 drivers/net/bnxt/tf_ulp/ulp_matcher.c |13 +
 drivers/net/bnxt/tf_ulp/ulp_port_db.c |15 +-
 drivers/net/bnxt/tf_ulp/ulp_rte_handler_tbl.c |31 +
 drivers/net/bnxt/tf_ulp/ulp_rte_parser.c  |   663 +-
 drivers/net/bnxt/tf_ulp/ulp_rte_parser.h  |12 +-
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |32 +-
 drivers/net/bnxt/tf_ulp/ulp_tun.c |   521 +-
 drivers/net/bnxt/tf_ulp/ulp_tun.h |89 +-
 drivers/net/bnxt/tf_ulp/ulp_utils.c   |71 +-
 drivers/net/bnxt/tf_ulp/ulp_utils.h   |27 +-
 64 files changed, 71146 insertions(+), 12949 deletions(-)
 create mode 100644 drivers/net/bnxt/tf_core/tf_sram_mgr.c
 create mode 100644 drivers/net/bnxt/tf_core/tf_sram_mgr.h
 create mode 100644 drivers/net/bnxt/tf_core/tf_tbl_sram.c
 create mode 100644 drivers/net/bnxt/tf_core/tf_tbl_sram.h

-- 
2.17.1

[dpdk-dev] [PATCH v4 02/13] net/bnxt: enable dpool allocator

2021-09-20 Thread Venkat Duvvuru

From: Peter Spreadborough 

Enable dynamic entry allocator for Exact Match SRAM entries.
Deprecate static entry allocator code.

Signed-off-by: Peter Spreadborough 
Reviewed-by: Randy Schacher 
Acked-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/tf_device_p58.c  |   4 -
 drivers/net/bnxt/tf_core/tf_em.h  |  10 -
 .../net/bnxt/tf_core/tf_em_hash_internal.c|  34 
 drivers/net/bnxt/tf_core/tf_em_internal.c | 180 +-
 4 files changed, 1 insertion(+), 227 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/tf_device_p58.c 
b/drivers/net/bnxt/tf_core/tf_device_p58.c
index ce4d8c661f..808dcb1f77 100644
--- a/drivers/net/bnxt/tf_core/tf_device_p58.c
+++ b/drivers/net/bnxt/tf_core/tf_device_p58.c
@@ -348,11 +348,7 @@ const struct tf_dev_ops tf_dev_ops_p58 = {
.tf_dev_get_tcam_resc_info = tf_tcam_get_resc_info,
.tf_dev_insert_int_em_entry = tf_em_hash_insert_int_entry,
.tf_dev_delete_int_em_entry = tf_em_hash_delete_int_entry,
-#if (TF_EM_ALLOC == 1)
.tf_dev_move_int_em_entry = tf_em_move_int_entry,
-#else
-   .tf_dev_move_int_em_entry = NULL,
-#endif
.tf_dev_insert_ext_em_entry = NULL,
.tf_dev_delete_ext_em_entry = NULL,
.tf_dev_get_em_resc_info = tf_em_get_resc_info,
diff --git a/drivers/net/bnxt/tf_core/tf_em.h b/drivers/net/bnxt/tf_core/tf_em.h
index 568071ad8c..074c128651 100644
--- a/drivers/net/bnxt/tf_core/tf_em.h
+++ b/drivers/net/bnxt/tf_core/tf_em.h
@@ -13,16 +13,6 @@
 
 #include "hcapi_cfa_defs.h"
 
-/**
- * TF_EM_ALLOC
- *
- * 0: Use stack allocator with fixed sized entries
- *(default).
- * 1: Use dpool allocator with variable size
- *entries.
- */
-#define TF_EM_ALLOC 0
-
 #define TF_EM_MIN_ENTRIES (1 << 15) /* 32K */
 #define TF_EM_MAX_ENTRIES (1 << 27) /* 128M */
 
diff --git a/drivers/net/bnxt/tf_core/tf_em_hash_internal.c 
b/drivers/net/bnxt/tf_core/tf_em_hash_internal.c
index 098e8af07e..60273a798c 100644
--- a/drivers/net/bnxt/tf_core/tf_em_hash_internal.c
+++ b/drivers/net/bnxt/tf_core/tf_em_hash_internal.c
@@ -22,9 +22,7 @@
 /**
  * EM Pool
  */
-#if (TF_EM_ALLOC == 1)
 #include "dpool.h"
-#endif
 
 /**
  * Insert EM internal entry API
@@ -41,11 +39,7 @@ tf_em_hash_insert_int_entry(struct tf *tfp,
uint16_t rptr_index = 0;
uint8_t rptr_entry = 0;
uint8_t num_of_entries = 0;
-#if (TF_EM_ALLOC == 1)
struct dpool *pool;
-#else
-   struct stack *pool;
-#endif
uint32_t index;
uint32_t key0_hash;
uint32_t key1_hash;
@@ -62,7 +56,6 @@ tf_em_hash_insert_int_entry(struct tf *tfp,
rc = tf_session_get_device(tfs, &dev);
if (rc)
return rc;
-#if (TF_EM_ALLOC == 1)
pool = (struct dpool *)tfs->em_pool[parms->dir];
index = dpool_alloc(pool,
parms->em_record_sz_in_bits / 128,
@@ -74,16 +67,6 @@ tf_em_hash_insert_int_entry(struct tf *tfp,
tf_dir_2_str(parms->dir));
return -1;
}
-#else
-   pool = (struct stack *)tfs->em_pool[parms->dir];
-   rc = stack_pop(pool, &index);
-   if (rc) {
-   PMD_DRV_LOG(ERR,
-   "%s, EM entry index allocation failed\n",
-   tf_dir_2_str(parms->dir));
-   return rc;
-   }
-#endif
 
if (dev->ops->tf_dev_cfa_key_hash == NULL)
return -EINVAL;
@@ -103,11 +86,7 @@ tf_em_hash_insert_int_entry(struct tf *tfp,
  &num_of_entries);
if (rc) {
/* Free the allocated index before returning */
-#if (TF_EM_ALLOC == 1)
dpool_free(pool, index);
-#else
-   stack_push(pool, index);
-#endif
return -1;
}
 
@@ -128,9 +107,7 @@ tf_em_hash_insert_int_entry(struct tf *tfp,
 rptr_index,
 rptr_entry,
 0);
-#if (TF_EM_ALLOC == 1)
dpool_set_entry_data(pool, index, parms->flow_handle);
-#endif
return 0;
 }
 
@@ -146,11 +123,7 @@ tf_em_hash_delete_int_entry(struct tf *tfp,
 {
int rc = 0;
struct tf_session *tfs;
-#if (TF_EM_ALLOC == 1)
struct dpool *pool;
-#else
-   struct stack *pool;
-#endif
/* Retrieve the session information */
rc = tf_session_get_session(tfp, &tfs);
if (rc) {
@@ -165,19 +138,13 @@ tf_em_hash_delete_int_entry(struct tf *tfp,
 
/* Return resource to pool */
if (rc == 0) {
-#if (TF_EM_ALLOC == 1)
pool = (struct dpool *)tfs->em_pool[parms->dir];
dpool_free(pool, parms->index);
-#else
-   pool = (struct stack *)tfs->em_pool[parms->dir];
-   stack_push(pool, parms->index);
-#endif
}
 
return rc;
 }
 
-#if (TF_EM_ALLOC == 1)
 /** Move EM internal entry API
  *
  * returns:
@@ -212,4 +179,3 @@ tf_em_move_int

[dpdk-dev] [PATCH v4 01/13] net/bnxt: updates to TF core index table

2021-09-20 Thread Venkat Duvvuru

From: Farah Smith 

Update the TRUFLOW core index table and
remove unused shadow table functionality.

Signed-off-by: Farah Smith 
Reviewed-by: Peter Spreadborough 
Acked-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/tf_core.c   |  65 --
 drivers/net/bnxt/tf_core/tf_core.h   | 103 +--
 drivers/net/bnxt/tf_core/tf_device.h |  22 -
 drivers/net/bnxt/tf_core/tf_device_p4.c  |   2 -
 drivers/net/bnxt/tf_core/tf_device_p58.c |   2 -
 drivers/net/bnxt/tf_core/tf_em_common.c  |   4 +
 drivers/net/bnxt/tf_core/tf_tbl.c|  21 -
 drivers/net/bnxt/tf_core/tf_tbl.h|  72 
 drivers/net/bnxt/tf_ulp/ulp_mapper.c |   3 +-
 9 files changed, 7 insertions(+), 287 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/tf_core.c 
b/drivers/net/bnxt/tf_core/tf_core.c
index 97e6165e92..5458f76e2d 100644
--- a/drivers/net/bnxt/tf_core/tf_core.c
+++ b/drivers/net/bnxt/tf_core/tf_core.c
@@ -1105,71 +1105,6 @@ tf_alloc_tbl_entry(struct tf *tfp,
return 0;
 }
 
-int
-tf_search_tbl_entry(struct tf *tfp,
-   struct tf_search_tbl_entry_parms *parms)
-{
-   int rc;
-   struct tf_session *tfs;
-   struct tf_dev_info *dev;
-   struct tf_tbl_alloc_search_parms sparms;
-
-   TF_CHECK_PARMS2(tfp, parms);
-
-   /* Retrieve the session information */
-   rc = tf_session_get_session(tfp, &tfs);
-   if (rc) {
-   TFP_DRV_LOG(ERR,
-   "%s: Failed to lookup session, rc:%s\n",
-   tf_dir_2_str(parms->dir),
-   strerror(-rc));
-   return rc;
-   }
-
-   /* Retrieve the device information */
-   rc = tf_session_get_device(tfs, &dev);
-   if (rc) {
-   TFP_DRV_LOG(ERR,
-   "%s: Failed to lookup device, rc:%s\n",
-   tf_dir_2_str(parms->dir),
-   strerror(-rc));
-   return rc;
-   }
-
-   if (dev->ops->tf_dev_alloc_search_tbl == NULL) {
-   rc = -EOPNOTSUPP;
-   TFP_DRV_LOG(ERR,
-   "%s: Operation not supported, rc:%s\n",
-   tf_dir_2_str(parms->dir),
-   strerror(-rc));
-   return rc;
-   }
-
-   memset(&sparms, 0, sizeof(struct tf_tbl_alloc_search_parms));
-   sparms.dir = parms->dir;
-   sparms.type = parms->type;
-   sparms.result = parms->result;
-   sparms.result_sz_in_bytes = parms->result_sz_in_bytes;
-   sparms.alloc = parms->alloc;
-   sparms.tbl_scope_id = parms->tbl_scope_id;
-   rc = dev->ops->tf_dev_alloc_search_tbl(tfp, &sparms);
-   if (rc) {
-   TFP_DRV_LOG(ERR,
-   "%s: TBL allocation failed, rc:%s\n",
-   tf_dir_2_str(parms->dir),
-   strerror(-rc));
-   return rc;
-   }
-
-   /* Return the outputs from the search */
-   parms->hit = sparms.hit;
-   parms->search_status = sparms.search_status;
-   parms->ref_cnt = sparms.ref_cnt;
-   parms->idx = sparms.idx;
-
-   return 0;
-}
-
 int
 tf_free_tbl_entry(struct tf *tfp,
  struct tf_free_tbl_entry_parms *parms)
diff --git a/drivers/net/bnxt/tf_core/tf_core.h 
b/drivers/net/bnxt/tf_core/tf_core.h
index 84b234f0e3..7e0cdf7e0d 100644
--- a/drivers/net/bnxt/tf_core/tf_core.h
+++ b/drivers/net/bnxt/tf_core/tf_core.h
@@ -1622,79 +1622,6 @@ int tf_clear_tcam_shared_entries(struct tf *tfp,
  * @ref tf_get_shared_tbl_increment
  */
 
-/**
- * tf_alloc_tbl_entry parameter definition
- */
-struct tf_search_tbl_entry_parms {
-   /**
-* [in] Receive or transmit direction
-*/
-   enum tf_dir dir;
-   /**
-* [in] Type of the allocation
-*/
-   enum tf_tbl_type type;
-   /**
-* [in] Table scope identifier (ignored unless TF_TBL_TYPE_EXT)
-*/
-   uint32_t tbl_scope_id;
-   /**
-* [in] Result data to search for
-*/
-   uint8_t *result;
-   /**
-* [in] Result data size in bytes
-*/
-   uint16_t result_sz_in_bytes;
-   /**
-* [in] Allocate on miss.
-*/
-   uint8_t alloc;
-   /**
-* [out] Set if matching entry found
-*/
-   uint8_t hit;
-   /**
-* [out] Search result status (hit, miss, reject)
-*/
-   enum tf_search_status search_status;
-   /**
-* [out] Current ref count after allocation
-*/
-   uint16_t ref_cnt;
-   /**
-* [out] Idx of allocated entry or found entry
-*/
-   uint32_t idx;
-};
-
-/**
- * search Table Entry (experimental)
- *
- * This function searches the shadow copy of an index table for a matching
- * entry.  The result data must match for hit to be set.  Only TruFlow core
- * data is accessed.  If shadow_copy is not enab

[dpdk-dev] [PATCH v4 03/13] net/bnxt: add flow meter drop counter support

2021-09-20 Thread Venkat Duvvuru

From: Jay Ding 

This patch adds flow meter drop counter support for Thor.

Signed-off-by: Jay Ding 
Reviewed-by: Farah Smith 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/cfa_resource_types.h |  5 +-
 drivers/net/bnxt/tf_core/tf_core.h|  8 +-
 drivers/net/bnxt/tf_core/tf_device_p58.c  |  1 +
 drivers/net/bnxt/tf_core/tf_device_p58.h  | 14 
 drivers/net/bnxt/tf_core/tf_tbl.c | 74 +++
 drivers/net/bnxt/tf_core/tf_util.c|  2 +
 6 files changed, 68 insertions(+), 36 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/cfa_resource_types.h 
b/drivers/net/bnxt/tf_core/cfa_resource_types.h
index cbab0d0078..36a55d4e17 100644
--- a/drivers/net/bnxt/tf_core/cfa_resource_types.h
+++ b/drivers/net/bnxt/tf_core/cfa_resource_types.h
@@ -104,10 +104,11 @@
 #define CFA_RESOURCE_TYPE_P58_WC_FKB 0x12UL
 /* VEB TCAM */
 #define CFA_RESOURCE_TYPE_P58_VEB_TCAM   0x13UL
+/* Metadata */
+#define CFA_RESOURCE_TYPE_P58_METADATA   0x14UL
 /* Meter drop counter */
 #define CFA_RESOURCE_TYPE_P58_METER_DROP_CNT 0x15UL
-#define CFA_RESOURCE_TYPE_P58_LAST   
CFA_RESOURCE_TYPE_P58_METER_DROP_CNT
-
+#define CFA_RESOURCE_TYPE_P58_LAST  
CFA_RESOURCE_TYPE_P58_METER_DROP_CNT
 
 /* Multicast Group */
 #define CFA_RESOURCE_TYPE_P45_MCG 0x0UL
diff --git a/drivers/net/bnxt/tf_core/tf_core.h 
b/drivers/net/bnxt/tf_core/tf_core.h
index 7e0cdf7e0d..af8d13bd7e 100644
--- a/drivers/net/bnxt/tf_core/tf_core.h
+++ b/drivers/net/bnxt/tf_core/tf_core.h
@@ -283,9 +283,9 @@ enum tf_tbl_type {
TF_TBL_TYPE_ACT_MODIFY_32B,
/** TH 64B Modify Record */
TF_TBL_TYPE_ACT_MODIFY_64B,
-   /** (Future) Meter Profiles */
+   /** Meter Profiles */
TF_TBL_TYPE_METER_PROF,
-   /** (Future) Meter Instance */
+   /** Meter Instance */
TF_TBL_TYPE_METER_INST,
/** Wh+/SR/Th Mirror Config */
TF_TBL_TYPE_MIRROR_CONFIG,
@@ -301,6 +301,8 @@ enum tf_tbl_type {
TF_TBL_TYPE_EM_FKB,
/** TH WC Flexible Key builder */
TF_TBL_TYPE_WC_FKB,
+   /** Meter Drop Counter */
+   TF_TBL_TYPE_METER_DROP_CNT,
 
/* External */
 
@@ -2194,6 +2196,8 @@ enum tf_global_config_type {
TF_TUNNEL_ENCAP,  /**< Tunnel Encap Config(TECT) */
TF_ACTION_BLOCK,  /**< Action Block Config(ABCR) */
TF_COUNTER_CFG,   /**< Counter Configuration (CNTRS_CTRL) */
+   TF_METER_CFG, /**< Meter Config(ACTP4_FMTCR) */
+   TF_METER_INTERVAL_CFG, /**< Meter Interval Config(FMTCR_INTERVAL)  */
TF_GLOBAL_CFG_TYPE_MAX
 };
 
diff --git a/drivers/net/bnxt/tf_core/tf_device_p58.c 
b/drivers/net/bnxt/tf_core/tf_device_p58.c
index 808dcb1f77..a492c62bff 100644
--- a/drivers/net/bnxt/tf_core/tf_device_p58.c
+++ b/drivers/net/bnxt/tf_core/tf_device_p58.c
@@ -43,6 +43,7 @@ const char *tf_resource_str_p58[CFA_RESOURCE_TYPE_P58_LAST + 
1] = {
[CFA_RESOURCE_TYPE_P58_EM_FKB] = "em_fkb  ",
[CFA_RESOURCE_TYPE_P58_WC_FKB] = "wc_fkb  ",
[CFA_RESOURCE_TYPE_P58_VEB_TCAM]   = "veb ",
+   [CFA_RESOURCE_TYPE_P58_METADATA]   = "metadata",
 };
 
 /**
diff --git a/drivers/net/bnxt/tf_core/tf_device_p58.h 
b/drivers/net/bnxt/tf_core/tf_device_p58.h
index 66b0f4e983..8c2e07aa34 100644
--- a/drivers/net/bnxt/tf_core/tf_device_p58.h
+++ b/drivers/net/bnxt/tf_core/tf_device_p58.h
@@ -75,10 +75,18 @@ struct tf_rm_element_cfg tf_tbl_p58[TF_TBL_TYPE_MAX] = {
TF_RM_ELEM_CFG_HCAPI_BA, CFA_RESOURCE_TYPE_P58_METER,
0, 0, 0
},
+   [TF_TBL_TYPE_METER_DROP_CNT] = {
+   TF_RM_ELEM_CFG_HCAPI_BA, CFA_RESOURCE_TYPE_P58_METER_DROP_CNT,
+   0, 0, 0
+   },
[TF_TBL_TYPE_MIRROR_CONFIG] = {
TF_RM_ELEM_CFG_HCAPI_BA, CFA_RESOURCE_TYPE_P58_MIRROR,
0, 0, 0
},
+   [TF_TBL_TYPE_METADATA] = {
+   TF_RM_ELEM_CFG_HCAPI_BA, CFA_RESOURCE_TYPE_P58_METADATA,
+   0, 0, 0
+   },
/* Policy - ARs in bank 1 */
[TF_TBL_TYPE_FULL_ACT_RECORD] = {
.cfg_type= TF_RM_ELEM_CFG_HCAPI_BA_PARENT,
@@ -194,5 +202,11 @@ struct tf_global_cfg_cfg 
tf_global_cfg_p58[TF_GLOBAL_CFG_TYPE_MAX] = {
[TF_COUNTER_CFG] = {
TF_GLOBAL_CFG_CFG_HCAPI, TF_COUNTER_CFG
},
+   [TF_METER_CFG] = {
+   TF_GLOBAL_CFG_CFG_HCAPI, TF_METER_CFG
+   },
+   [TF_METER_INTERVAL_CFG] = {
+   TF_GLOBAL_CFG_CFG_HCAPI, TF_METER_INTERVAL_CFG
+   },
 };
 #endif /* _TF_DEVICE_P58_H_ */
diff --git a/drivers/net/bnxt/tf_core/tf_tbl.c 
b/drivers/net/bnxt/tf_core/tf_tbl.c
index e77399c6bd..7011edcd78 100644
--- a/drivers/net/bnxt/tf_core/tf_tbl.c
+++ b/drivers/net/bnxt/tf_core/tf_tbl.c
@@ -374,23 +374,28 @@ tf_tbl_set(struct tf *tfp,
}
}
 
-   /* Verify that the entry has been previously allo

[dpdk-dev] [PATCH v4 08/13] net/bnxt: add wild card TCAM byte order for Thor

2021-09-20 Thread Venkat Duvvuru

From: Kishore Padmanabha 

The wild card TCAM for Thor platform is different from the profile TCAM
byte order.

Signed-off-by: Kishore Padmanabha 
Signed-off-by: Venkat Duvvuru 
Reviewed-by: Shuanglin Wang 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 .../generic_templates/ulp_template_db_tbl.c   |  2 ++
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 25 +--
 drivers/net/bnxt/tf_ulp/ulp_template_struct.h |  1 +
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c 
b/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
index b5bce6f4c7..68f1b5fd00 100644
--- a/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
+++ b/drivers/net/bnxt/tf_ulp/generic_templates/ulp_template_db_tbl.c
@@ -201,6 +201,7 @@ struct bnxt_ulp_device_params 
ulp_device_params[BNXT_ULP_DEVICE_ID_LAST] = {
.key_byte_order  = BNXT_ULP_BYTE_ORDER_LE,
.result_byte_order   = BNXT_ULP_BYTE_ORDER_LE,
.encap_byte_order= BNXT_ULP_BYTE_ORDER_BE,
+   .wc_key_byte_order   = BNXT_ULP_BYTE_ORDER_BE,
.encap_byte_swap = 1,
.int_flow_db_num_entries = 16384,
.ext_flow_db_num_entries = 32768,
@@ -223,6 +224,7 @@ struct bnxt_ulp_device_params 
ulp_device_params[BNXT_ULP_DEVICE_ID_LAST] = {
.key_byte_order  = BNXT_ULP_BYTE_ORDER_LE,
.result_byte_order   = BNXT_ULP_BYTE_ORDER_LE,
.encap_byte_order= BNXT_ULP_BYTE_ORDER_BE,
+   .wc_key_byte_order   = BNXT_ULP_BYTE_ORDER_BE,
.encap_byte_swap = 1,
.int_flow_db_num_entries = 16384,
.ext_flow_db_num_entries = 32768,
diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index 2687a545f3..bcc089b3e1 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -1953,6 +1953,15 @@ static void ulp_mapper_wc_tcam_tbl_post_process(struct 
ulp_blob *blob)
 #endif
 }
 
+static int32_t ulp_mapper_tcam_is_wc_tcam(struct bnxt_ulp_mapper_tbl_info *tbl)
+{
+   if (tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM ||
+   tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM_HIGH ||
+   tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM_LOW)
+   return 1;
+   return 0;
+}
+
 static int32_t
 ulp_mapper_tcam_tbl_process(struct bnxt_ulp_mapper_parms *parms,
struct bnxt_ulp_mapper_tbl_info *tbl)
@@ -1972,6 +1981,7 @@ ulp_mapper_tcam_tbl_process(struct bnxt_ulp_mapper_parms 
*parms,
uint32_t hit = 0;
uint16_t tmplen = 0;
uint16_t idx;
+   enum bnxt_ulp_byte_order key_byte_order;
 
/* Set the key and mask to the original key and mask. */
key = &okey;
@@ -2003,10 +2013,13 @@ ulp_mapper_tcam_tbl_process(struct 
bnxt_ulp_mapper_parms *parms,
return -EINVAL;
}
 
-   if (!ulp_blob_init(key, tbl->blob_key_bit_size,
-  dparms->key_byte_order) ||
-   !ulp_blob_init(mask, tbl->blob_key_bit_size,
-  dparms->key_byte_order) ||
+   if (ulp_mapper_tcam_is_wc_tcam(tbl))
+   key_byte_order = dparms->wc_key_byte_order;
+   else
+   key_byte_order = dparms->key_byte_order;
+
+   if (!ulp_blob_init(key, tbl->blob_key_bit_size, key_byte_order) ||
+   !ulp_blob_init(mask, tbl->blob_key_bit_size, key_byte_order) ||
!ulp_blob_init(&data, tbl->result_bit_size,
   dparms->result_byte_order) ||
!ulp_blob_init(&update_data, tbl->result_bit_size,
@@ -2043,9 +2056,7 @@ ulp_mapper_tcam_tbl_process(struct bnxt_ulp_mapper_parms 
*parms,
}
 
/* For wild card tcam perform the post process to swap the blob */
-   if (tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM ||
-   tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM_HIGH ||
-   tbl->resource_type == TF_TCAM_TBL_TYPE_WC_TCAM_LOW) {
+   if (ulp_mapper_tcam_is_wc_tcam(tbl)) {
if (dparms->dynamic_pad_en) {
/* Sets up the slices for writing to the WC TCAM */
rc = ulp_mapper_wc_tcam_tbl_dyn_post_process(dparms,
diff --git a/drivers/net/bnxt/tf_ulp/ulp_template_struct.h 
b/drivers/net/bnxt/tf_ulp/ulp_template_struct.h
index 904763f27d..e2a4b81cec 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_template_struct.h
+++ b/drivers/net/bnxt/tf_ulp/ulp_template_struct.h
@@ -212,6 +212,7 @@ struct bnxt_ulp_device_params {
enum bnxt_ulp_byte_orderkey_byte_order;
enum bnxt_ulp_byte_orderresult_byte_order;
enum bnxt_ulp_byte_orderencap_byte_order;
+   enum bnxt_ulp_byte_orderwc_key_byte_order;
uint8_t encap_byte_swap;
uint8_t num_phy_ports;
uint32_tmark_db_lfid_e

[dpdk-dev] [PATCH v4 04/13] net/bnxt: add SRAM manager model

2021-09-20 Thread Venkat Duvvuru

From: Farah Smith 

The SRAM manager supports allocation and free of variable sized
records within SRAM memory.  These record sizes are 8, 16, 32, or
64B. The SRAM manager algorithm will not fragment memory during
run time. Previous implementation only included fixed size 64B
records regardless of the size required.

Signed-off-by: Farah Smith 
Reviewed-by: Shahaji Bhosle 
Reviewed-by: Peter Spreadborough 
Acked-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/ll.c |   3 +
 drivers/net/bnxt/tf_core/ll.h |  50 +-
 drivers/net/bnxt/tf_core/meson.build  |   2 +
 drivers/net/bnxt/tf_core/tf_core.c| 104 ++-
 drivers/net/bnxt/tf_core/tf_core.h|  48 +-
 drivers/net/bnxt/tf_core/tf_device.c  |  40 +-
 drivers/net/bnxt/tf_core/tf_device.h  | 133 ++-
 drivers/net/bnxt/tf_core/tf_device_p4.c   |  75 +-
 drivers/net/bnxt/tf_core/tf_device_p4.h   |  50 +-
 drivers/net/bnxt/tf_core/tf_device_p58.c  | 105 ++-
 drivers/net/bnxt/tf_core/tf_device_p58.h  |  60 +-
 drivers/net/bnxt/tf_core/tf_msg.c |   2 +-
 drivers/net/bnxt/tf_core/tf_rm.c  |  46 +-
 drivers/net/bnxt/tf_core/tf_rm.h  |  62 +-
 drivers/net/bnxt/tf_core/tf_session.c |  56 ++
 drivers/net/bnxt/tf_core/tf_session.h |  58 +-
 drivers/net/bnxt/tf_core/tf_sram_mgr.c| 971 ++
 drivers/net/bnxt/tf_core/tf_sram_mgr.h| 317 +++
 drivers/net/bnxt/tf_core/tf_tbl.c | 186 +
 drivers/net/bnxt/tf_core/tf_tbl.h |  15 +-
 drivers/net/bnxt/tf_core/tf_tbl_sram.c| 713 
 drivers/net/bnxt/tf_core/tf_tbl_sram.h| 154 
 drivers/net/bnxt/tf_core/tf_tcam.c|  10 +-
 drivers/net/bnxt/tf_core/tf_tcam.h|   7 +
 drivers/net/bnxt/tf_core/tf_tcam_shared.c |  28 +-
 drivers/net/bnxt/tf_core/tf_util.c|  10 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c|  23 +
 27 files changed, 2976 insertions(+), 352 deletions(-)
 create mode 100644 drivers/net/bnxt/tf_core/tf_sram_mgr.c
 create mode 100644 drivers/net/bnxt/tf_core/tf_sram_mgr.h
 create mode 100644 drivers/net/bnxt/tf_core/tf_tbl_sram.c
 create mode 100644 drivers/net/bnxt/tf_core/tf_tbl_sram.h

diff --git a/drivers/net/bnxt/tf_core/ll.c b/drivers/net/bnxt/tf_core/ll.c
index cd168a7970..f2bdff6b9e 100644
--- a/drivers/net/bnxt/tf_core/ll.c
+++ b/drivers/net/bnxt/tf_core/ll.c
@@ -13,6 +13,7 @@ void ll_init(struct ll *ll)
 {
ll->head = NULL;
ll->tail = NULL;
+   ll->cnt = 0;
 }
 
 /* insert entry in linked list */
@@ -30,6 +31,7 @@ void ll_insert(struct ll *ll,
entry->next->prev = entry;
ll->head = entry->next->prev;
}
+   ll->cnt++;
 }
 
 /* delete entry from linked list */
@@ -49,4 +51,5 @@ void ll_delete(struct ll *ll,
entry->prev->next = entry->next;
entry->next->prev = entry->prev;
}
+   ll->cnt--;
 }
diff --git a/drivers/net/bnxt/tf_core/ll.h b/drivers/net/bnxt/tf_core/ll.h
index 239478b4f8..9cf8f64ec2 100644
--- a/drivers/net/bnxt/tf_core/ll.h
+++ b/drivers/net/bnxt/tf_core/ll.h
@@ -8,6 +8,8 @@
 #ifndef _LL_H_
 #define _LL_H_
 
+#include 
+
 /* linked list entry */
 struct ll_entry {
struct ll_entry *prev;
@@ -18,6 +20,7 @@ struct ll_entry {
 struct ll {
struct ll_entry *head;
struct ll_entry *tail;
+   uint32_t cnt;
 };
 
 /**
@@ -28,7 +31,7 @@ struct ll {
 void ll_init(struct ll *ll);
 
 /**
- * Linked list insert
+ * Linked list insert head
  *
  * [in] ll, linked list where element is inserted
  * [in] entry, entry to be added
@@ -43,4 +46,49 @@ void ll_insert(struct ll *ll, struct ll_entry *entry);
  */
 void ll_delete(struct ll *ll, struct ll_entry *entry);
 
+/**
+ * Linked list return next entry without deleting it
+ *
+ * Useful in performing search
+ *
+ * [in] Entry in the list
+ */
+static inline struct ll_entry *ll_next(struct ll_entry *entry)
+{
+   return entry->next;
+}
+
+/**
+ * Linked list return the head of the list without removing it
+ *
+ * Useful in performing search
+ *
+ * [in] ll, linked list
+ */
+static inline struct ll_entry *ll_head(struct ll *ll)
+{
+   return ll->head;
+}
+
+/**
+ * Linked list return the tail of the list without removing it
+ *
+ * Useful in performing search
+ *
+ * [in] ll, linked list
+ */
+static inline struct ll_entry *ll_tail(struct ll *ll)
+{
+   return ll->tail;
+}
+
+/**
+ * Linked list return the number of entries in the list
+ *
+ * [in] ll, linked list
+ */
+static inline uint32_t ll_cnt(struct ll *ll)
+{
+   return ll->cnt;
+}
 #endif /* _LL_H_ */
diff --git a/drivers/net/bnxt/tf_core/meson.build 
b/drivers/net/bnxt/tf_core/meson.build
index f28e77ec2e..206935d18a 100644
--- a/drivers/net/bnxt/tf_core/meson.build
+++ b/drivers/net/bnxt/tf_core/meson.build
@@ -16,6 +16,8 @@ sources += files(
 'stack.c',
 'tf_rm.c',
 'tf_tbl.c',
+'tf_tbl_sram.c',
+'tf_sram_mgr.c',
 'tf_em_common.c',
 'tf_em_h

[dpdk-dev] [PATCH v4 10/13] net/bnxt: change log level to debug

2021-09-20 Thread Venkat Duvvuru

From: Farah Smith 

Adjust info message to debug level to prevent excessive
logging.

Signed-off-by: Farah Smith 
Reviewed-by: Mike Baucom 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/tf_tbl_sram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/tf_core/tf_tbl_sram.c 
b/drivers/net/bnxt/tf_core/tf_tbl_sram.c
index ea10afecb6..d7727f7a11 100644
--- a/drivers/net/bnxt/tf_core/tf_tbl_sram.c
+++ b/drivers/net/bnxt/tf_core/tf_tbl_sram.c
@@ -130,7 +130,7 @@ static int tf_tbl_sram_get_info(struct 
tf_tbl_sram_get_info_parms *parms)
if (slices)
parms->slice_size = tf_tbl_sram_slices_2_size[slices];
 
-   TFP_DRV_LOG(INFO,
+   TFP_DRV_LOG(DEBUG,
"(%s) bank(%s) slice_size(%s)\n",
tf_tbl_type_2_str(parms->tbl_type),
tf_sram_bank_2_str(parms->bank_id),
-- 
2.17.1

[dpdk-dev] [PATCH v4 11/13] net/bnxt: dynamically allocate space for EM defrag function

2021-09-20 Thread Venkat Duvvuru

From: Randy Schacher 

The dynamic pool allocation defrag function currently uses stack
allocation. To improve use of stack space, dynamically allocate
and deallocate memory for use to defragment the dynamic pool of
EM resources.

Signed-off-by: Randy Schacher 
Reviewed-by: Peter Spreadborough 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/dpool.c | 38 +---
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/dpool.c b/drivers/net/bnxt/tf_core/dpool.c
index 145efa486f..5c03f775a5 100644
--- a/drivers/net/bnxt/tf_core/dpool.c
+++ b/drivers/net/bnxt/tf_core/dpool.c
@@ -7,9 +7,6 @@
 #include 
 #include 
 #include 
-
-#include 
-
 #include "tfp.h"
 #include "dpool.h"
 
@@ -84,13 +81,13 @@ static int dpool_move(struct dpool *dpool,
return 0;
 }
 
-
 int dpool_defrag(struct dpool *dpool,
 uint32_t entry_size,
 uint8_t defrag)
 {
struct dpool_free_list *free_list;
struct dpool_adj_list *adj_list;
+   struct tfp_calloc_parms parms;
uint32_t count;
uint32_t index;
uint32_t used;
@@ -103,15 +100,31 @@ int dpool_defrag(struct dpool *dpool,
uint32_t max_size = 0;
int rc;
 
-   free_list = rte_zmalloc("dpool_free_list",
-   sizeof(struct dpool_free_list), 0);
+   parms.nitems = 1;
+   parms.size = sizeof(struct dpool_free_list);
+   parms.alignment = 0;
+
+   rc = tfp_calloc(&parms);
+
+   if (rc)
+   return rc;
+
+   free_list = (struct dpool_free_list *)parms.mem_va;
if (free_list == NULL) {
TFP_DRV_LOG(ERR, "dpool free list allocation failed\n");
return -ENOMEM;
}
 
-   adj_list = rte_zmalloc("dpool_adjacent_list",
-   sizeof(struct dpool_adj_list), 0);
+   parms.nitems = 1;
+   parms.size = sizeof(struct dpool_adj_list);
+   parms.alignment = 0;
+
+   rc = tfp_calloc(&parms);
+
+   if (rc)
+   return rc;
+
+   adj_list = (struct dpool_adj_list *)parms.mem_va;
if (adj_list == NULL) {
TFP_DRV_LOG(ERR, "dpool adjacent list allocation failed\n");
return -ENOMEM;
@@ -239,8 +252,8 @@ int dpool_defrag(struct dpool *dpool,

free_list->entry[largest_free_index].index,
max_index);
if (rc) {
-   rte_free(free_list);
-   rte_free(adj_list);
+   tfp_free(free_list);
+   tfp_free(adj_list);
return rc;
}
} else {
@@ -249,12 +262,11 @@ int dpool_defrag(struct dpool *dpool,
}
 
 done:
-   rte_free(free_list);
-   rte_free(adj_list);
+   tfp_free(free_list);
+   tfp_free(adj_list);
return largest_free_size;
 }
 
-
 uint32_t dpool_alloc(struct dpool *dpool,
 uint32_t size,
 uint8_t defrag)
-- 
2.17.1

[dpdk-dev] [PATCH v4 05/13] net/bnxt: add flow template support for Thor

2021-09-20 Thread Venkat Duvvuru

From: Kishore Padmanabha 

Template adds non-VFR based support for testpmd with:
matches to include
- DMAC, SIP, DIP, Proto, Sport, Dport
- SIP, DIP, Proto, Sport, Dport
actions:
- count, drop

Signed-off-by: Kishore Padmanabha 
Signed-off-by: Venkat Duvvuru 
Reviewed-by: Mike Baucom 
Acked-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_ulp/bnxt_tf_common.h  |   6 +
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c|  36 +++---
 drivers/net/bnxt/tf_ulp/bnxt_ulp_flow.c   |  12 ++
 .../bnxt/tf_ulp/generic_templates/meson.build |   3 +
 .../ulp_template_db_thor_class.c  |   1 -
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c  |   2 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c | 120 +-
 drivers/net/bnxt/tf_ulp/ulp_flow_db.h |  26 +++-
 drivers/net/bnxt/tf_ulp/ulp_gen_tbl.c |   5 +
 drivers/net/bnxt/tf_ulp/ulp_ha_mgr.c  |   2 +-
 drivers/net/bnxt/tf_ulp/ulp_mapper.c  | 111 +++-
 drivers/net/bnxt/tf_ulp/ulp_matcher.c |  13 ++
 drivers/net/bnxt/tf_ulp/ulp_port_db.c |  15 ++-
 drivers/net/bnxt/tf_ulp/ulp_rte_parser.c  |   9 +-
 drivers/net/bnxt/tf_ulp/ulp_tun.c |  20 +++
 drivers/net/bnxt/tf_ulp/ulp_utils.c   |   8 +-
 16 files changed, 348 insertions(+), 41 deletions(-)

diff --git a/drivers/net/bnxt/tf_ulp/bnxt_tf_common.h 
b/drivers/net/bnxt/tf_ulp/bnxt_tf_common.h
index f59da41e54..e0ebed3fed 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_tf_common.h
+++ b/drivers/net/bnxt/tf_ulp/bnxt_tf_common.h
@@ -13,6 +13,12 @@
 
 #define BNXT_TF_DBG(lvl, fmt, args...) PMD_DRV_LOG(lvl, fmt, ## args)
 
+#ifdef RTE_LIBRTE_BNXT_TRUFLOW_DEBUG
+#define BNXT_TF_INF(fmt, args...)  PMD_DRV_LOG(INFO, fmt, ## args)
+#else
+#define BNXT_TF_INF(fmt, args...)
+#endif
+
 #define BNXT_ULP_EM_FLOWS  8192
 #define BNXT_ULP_1M_FLOWS  100
 #define BNXT_EEM_RX_GLOBAL_ID_MASK (BNXT_ULP_1M_FLOWS - 1)
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c 
b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index 183bae66c5..475c7a6cdf 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -698,6 +698,11 @@ ulp_eem_tbl_scope_init(struct bnxt *bp)
rc);
return rc;
}
+#ifdef RTE_LIBRTE_BNXT_TRUFLOW_DEBUG
+   BNXT_TF_DBG(DEBUG, "TableScope=0x%0x %d\n",
+   params.tbl_scope_id,
+   params.tbl_scope_id);
+#endif
rc = bnxt_ulp_cntxt_tbl_scope_id_set(bp->ulp_ctx, params.tbl_scope_id);
if (rc) {
BNXT_TF_DBG(ERR, "Unable to set table scope id\n");
@@ -825,6 +830,8 @@ ulp_ctx_init(struct bnxt *bp,
goto error_deinit;
}
 
+   /* TODO: For now we are overriding to APP:1 on this branch*/
+   bp->app_id = 1;
rc = bnxt_ulp_cntxt_app_id_set(bp->ulp_ctx, bp->app_id);
if (rc) {
BNXT_TF_DBG(ERR, "Unable to set app_id for ULP init.\n");
@@ -838,11 +845,6 @@ ulp_ctx_init(struct bnxt *bp,
goto error_deinit;
}
 
-   if (devid == BNXT_ULP_DEVICE_ID_THOR) {
-   ulp_data->ulp_flags &= ~BNXT_ULP_VF_REP_ENABLED;
-   BNXT_TF_DBG(ERR, "Enabled non-VFR mode\n");
-   }
-
/*
 * Shared session must be created before first regular session but after
 * the ulp_ctx is valid.
@@ -902,7 +904,7 @@ ulp_dparms_init(struct bnxt *bp, struct bnxt_ulp_context 
*ulp_ctx)
dparms->ext_flow_db_num_entries = bp->max_num_kflows * 1024;
/* GFID =  2 * num_flows */
dparms->mark_db_gfid_entries = dparms->ext_flow_db_num_entries * 2;
-   BNXT_TF_DBG(DEBUG, "Set the number of flows = %"PRIu64"\n",
+   BNXT_TF_DBG(DEBUG, "Set the number of flows = %" PRIu64 "\n",
dparms->ext_flow_db_num_entries);
 
return 0;
@@ -1393,17 +1395,13 @@ bnxt_ulp_port_init(struct bnxt *bp)
uint32_t ulp_flags;
int32_t rc = 0;
 
+   if (!bp || !BNXT_TRUFLOW_EN(bp))
+   return rc;
+
if (!BNXT_PF(bp) && !BNXT_VF_IS_TRUSTED(bp)) {
BNXT_TF_DBG(ERR,
"Skip ulp init for port: %d, not a TVF or PF\n",
-   bp->eth_dev->data->port_id);
-   return rc;
-   }
-
-   if (!BNXT_TRUFLOW_EN(bp)) {
-   BNXT_TF_DBG(DEBUG,
-   "Skip ulp init for port: %d, truflow is not 
enabled\n",
-   bp->eth_dev->data->port_id);
+   bp->eth_dev->data->port_id);
return rc;
}
 
@@ -1524,6 +1522,9 @@ bnxt_ulp_port_deinit(struct bnxt *bp)
struct rte_pci_device *pci_dev;
struct rte_pci_addr *pci_addr;
 
+   if (!BNXT_TRUFLOW_EN(bp))
+   return;
+
if (!BNXT_PF(bp) && !BNXT_VF_IS_TRUSTED(bp)) {
BNXT_TF_DBG(ERR,
"Skip ULP deinit port:%d, not a TVF or PF\n",
@@

[dpdk-dev] [PATCH v4 12/13] net/bnxt: add SRAM manager shared session

2021-09-20 Thread Venkat Duvvuru

From: Farah Smith 

Fix shared session support issues due to SRAM manager
additions. Shared session does not support slices within
RM blocks. Calculate resources required without slices
and determine base addresses using old methods for the
shared session.

Signed-off-by: Farah Smith 
Reviewed-by: Kishore Padmanabha 
Reviewed-by: Shahaji Bhosle 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/tf_core/tf_em_internal.c |   5 +-
 drivers/net/bnxt/tf_core/tf_rm.c  | 134 +++---
 drivers/net/bnxt/tf_core/tf_tbl_sram.c|  73 +---
 3 files changed, 176 insertions(+), 36 deletions(-)

diff --git a/drivers/net/bnxt/tf_core/tf_em_internal.c 
b/drivers/net/bnxt/tf_core/tf_em_internal.c
index 2d57595f17..67ba011eae 100644
--- a/drivers/net/bnxt/tf_core/tf_em_internal.c
+++ b/drivers/net/bnxt/tf_core/tf_em_internal.c
@@ -326,8 +326,11 @@ tf_em_int_unbind(struct tf *tfp)
return rc;
 
if (!tf_session_is_shared_session(tfs)) {
-   for (i = 0; i < TF_DIR_MAX; i++)
+   for (i = 0; i < TF_DIR_MAX; i++) {
+   if (tfs->em_pool[i] == NULL)
+   continue;
dpool_free_all(tfs->em_pool[i]);
+   }
}
 
rc = tf_session_get_db(tfp, TF_MODULE_TYPE_EM, &em_db_ptr);
diff --git a/drivers/net/bnxt/tf_core/tf_rm.c b/drivers/net/bnxt/tf_core/tf_rm.c
index 03c958a7d6..dd537aaece 100644
--- a/drivers/net/bnxt/tf_core/tf_rm.c
+++ b/drivers/net/bnxt/tf_core/tf_rm.c
@@ -18,6 +18,9 @@
 #include "tfp.h"
 #include "tf_msg.h"
 
+/* Logging defines */
+#define TF_RM_DEBUG  0
+
 /**
  * Generic RM Element data type that an RM DB is build upon.
  */
@@ -207,6 +210,45 @@ tf_rm_adjust_index(struct tf_rm_element *db,
return rc;
 }
 
+/**
+ * Logs an array of found residual entries to the console.
+ *
+ * [in] dir
+ *   Receive or transmit direction
+ *
+ * [in] module
+ *   Type of Device Module
+ *
+ * [in] count
+ *   Number of entries in the residual array
+ *
+ * [in] residuals
+ *   Pointer to an array of residual entries. Array is index same as
+ *   the DB in which this function is used. Each entry holds residual
+ *   value for that entry.
+ */
+#if (TF_RM_DEBUG == 1)
+static void
+tf_rm_log_residuals(enum tf_dir dir,
+   enum tf_module_type module,
+   uint16_t count,
+   uint16_t *residuals)
+{
+   int i;
+
+   /* Walk the residual array and log the types that wasn't
+* cleaned up to the console.
+*/
+   for (i = 0; i < count; i++) {
+   if (residuals[i] != 0)
+   TFP_DRV_LOG(INFO,
+   "%s, %s was not cleaned up, %d outstanding\n",
+   tf_dir_2_str(dir),
+   tf_module_subtype_2_str(module, i),
+   residuals[i]);
+   }
+}
+#endif /* TF_RM_DEBUG == 1 */
 /**
  * Performs a check of the passed in DB for any lingering elements. If
  * a resource type was found to not have been cleaned up by the caller
@@ -322,6 +364,12 @@ tf_rm_check_residuals(struct tf_rm_new_db *rm_db,
*resv_size = found;
}
 
+#if (TF_RM_DEBUG == 1)
+   tf_rm_log_residuals(rm_db->dir,
+   rm_db->module,
+   rm_db->num_entries,
+   residuals);
+#endif
tfp_free((void *)residuals);
*resv = local_resv;
 
@@ -367,7 +415,8 @@ tf_rm_update_parent_reservations(struct tf *tfp,
 struct tf_rm_element_cfg *cfg,
 uint16_t *alloc_cnt,
 uint16_t num_elements,
-uint16_t *req_cnt)
+uint16_t *req_cnt,
+bool shared_session)
 {
int parent, child;
const char *type_str;
@@ -378,18 +427,28 @@ tf_rm_update_parent_reservations(struct tf *tfp,
 
/* If I am a parent */
if (cfg[parent].cfg_type == TF_RM_ELEM_CFG_HCAPI_BA_PARENT) {
-   /* start with my own count */
-   RTE_ASSERT(cfg[parent].slices);
-   combined_cnt =
-   alloc_cnt[parent] / cfg[parent].slices;
+   uint8_t p_slices = 1;
+
+   /* Shared session doesn't support slices */
+   if (!shared_session)
+   p_slices = cfg[parent].slices;
+
+   RTE_ASSERT(p_slices);
 
-   if (alloc_cnt[parent] % cfg[parent].slices)
+   combined_cnt = alloc_cnt[parent] / p_slices;
+
+   if (alloc_cnt[parent] % p_slices)
combined_cnt++;
 
if (alloc_cnt[parent]) {
dev->ops->tf_dev

[dpdk-dev] [PATCH v4 0/3] eal: add memory pre-allocation from existing files

2021-09-20 Thread dkozlyuk

From: Dmitry Kozlyuk 

Hugepage allocation from the system takes time, resulting in slow
startup or sporadic delays later. Most of the time spent in kernel
is zero-filling memory for security reasons, which may be irrelevant
in a controlled environment. The bottleneck is memory access speed,
so for speeduup the amount of memory cleared must be reduced.
We propose a new EAL option --mem-file FILE1,FILE2,... to quickly
allocate dirty pages from existing files and clean it as necessary.
A new malloc_perf_autotest is provided to estimate the impact.
More details are explained in relevant patches.

v4: getmntent() -> getmntent_r(), better error detection (John Levon)
v3: fix hugepage mount point detection
v2: fix CI failures

Dmitry Kozlyuk (2):
  eal/linux: make hugetlbfs analysis reusable
  app/test: add allocator performance autotest

Viacheslav Ovsiienko (1):
  eal: add memory pre-allocation from existing files

 app/test/meson.build  |   2 +
 app/test/test_malloc_perf.c   | 157 +
 doc/guides/linux_gsg/linux_eal_parameters.rst |  17 +
 lib/eal/common/eal_common_dynmem.c|   6 +
 lib/eal/common/eal_common_options.c   |  23 ++
 lib/eal/common/eal_internal_cfg.h |   4 +
 lib/eal/common/eal_memalloc.h |   8 +-
 lib/eal/common/eal_options.h  |   2 +
 lib/eal/common/malloc_elem.c  |   5 +
 lib/eal/common/malloc_heap.h  |   8 +
 lib/eal/common/rte_malloc.c   |  16 +-
 lib/eal/include/rte_memory.h  |   4 +-
 lib/eal/linux/eal.c   |  28 ++
 lib/eal/linux/eal_hugepage_info.c | 158 ++---
 lib/eal/linux/eal_hugepage_info.h |  39 +++
 lib/eal/linux/eal_memalloc.c  | 328 +-
 16 files changed, 735 insertions(+), 70 deletions(-)
 create mode 100644 app/test/test_malloc_perf.c
 create mode 100644 lib/eal/linux/eal_hugepage_info.h

-- 
2.25.1

[dpdk-dev] [PATCH v4 1/3] eal/linux: make hugetlbfs analysis reusable

2021-09-20 Thread dkozlyuk

From: Dmitry Kozlyuk 

get_hugepage_dir() searched for a hugetlbfs mount with a given page size
using handcraft parsing of /proc/mounts and mixing traversal logic with
selecting the needed entry. Separate code to enumerate hugetlbfs mounts
to eal_hugepage_mount_walk() taking a callback that can inspect already
parsed entries. Use mntent(3) API for parsing. This allows to reuse
enumeration logic in subsequent patches.

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
---
 lib/eal/linux/eal_hugepage_info.c | 153 +++---
 lib/eal/linux/eal_hugepage_info.h |  39 
 2 files changed, 135 insertions(+), 57 deletions(-)
 create mode 100644 lib/eal/linux/eal_hugepage_info.h

diff --git a/lib/eal/linux/eal_hugepage_info.c 
b/lib/eal/linux/eal_hugepage_info.c
index d97792cade..193282e779 100644
--- a/lib/eal/linux/eal_hugepage_info.c
+++ b/lib/eal/linux/eal_hugepage_info.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +35,7 @@
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
 #include "eal_hugepages.h"
+#include "eal_hugepage_info.h"
 #include "eal_filesystem.h"
 
 static const char sys_dir_path[] = "/sys/kernel/mm/hugepages";
@@ -195,73 +197,110 @@ get_default_hp_size(void)
return size;
 }
 
-static int
-get_hugepage_dir(uint64_t hugepage_sz, char *hugedir, int len)
+int
+eal_hugepage_mount_walk(eal_hugepage_mount_walk_cb *cb, void *cb_arg)
 {
-   enum proc_mount_fieldnames {
-   DEVICE = 0,
-   MOUNTPT,
-   FSTYPE,
-   OPTIONS,
-   _FIELDNAME_MAX
-   };
-   static uint64_t default_size = 0;
-   const char proc_mounts[] = "/proc/mounts";
-   const char hugetlbfs_str[] = "hugetlbfs";
-   const size_t htlbfs_str_len = sizeof(hugetlbfs_str) - 1;
-   const char pagesize_opt[] = "pagesize=";
-   const size_t pagesize_opt_len = sizeof(pagesize_opt) - 1;
-   const char split_tok = ' ';
-   char *splitstr[_FIELDNAME_MAX];
-   char buf[BUFSIZ];
-   int retval = -1;
-   const struct internal_config *internal_conf =
-   eal_get_internal_configuration();
-
-   FILE *fd = fopen(proc_mounts, "r");
-   if (fd == NULL)
-   rte_panic("Cannot open %s\n", proc_mounts);
+   static const char PATH[] = "/proc/mounts";
+   static const char OPTION[] = "pagesize";
+
+   static uint64_t default_size;
+
+   FILE *f = NULL;
+   struct mntent mntent;
+   char strings[PATH_MAX];
+   char *hugepage_sz_str;
+   uint64_t hugepage_sz;
+   bool stopped = false;
+   int ret = -1;
+
+   f = setmntent(PATH, "r");
+   if (f == NULL) {
+   RTE_LOG(ERR, EAL, "%s(): setmntent(%s): %s\n",
+   __func__, PATH, strerror(errno));
+   goto exit;
+   }
 
if (default_size == 0)
default_size = get_default_hp_size();
 
-   while (fgets(buf, sizeof(buf), fd)){
-   if (rte_strsplit(buf, sizeof(buf), splitstr, _FIELDNAME_MAX,
-   split_tok) != _FIELDNAME_MAX) {
-   RTE_LOG(ERR, EAL, "Error parsing %s\n", proc_mounts);
-   break; /* return NULL */
+   ret = 0;
+   while (getmntent_r(f, &mntent, strings, sizeof(strings)) != NULL) {
+   if (strcmp(mntent.mnt_type, "hugetlbfs") != 0)
+   continue;
+
+   hugepage_sz_str = hasmntopt(&mntent, OPTION);
+   if (hugepage_sz_str != NULL) {
+   hugepage_sz_str += strlen(OPTION) + 1; /* +1 for '=' */
+   hugepage_sz = rte_str_to_size(hugepage_sz_str);
+   if (hugepage_sz == 0) {
+   RTE_LOG(DEBUG, EAL, "Cannot parse hugepage size 
from '%s' for %s\n",
+   mntent.mnt_opts, 
mntent.mnt_dir);
+   continue;
+   }
+   } else {
+   RTE_LOG(DEBUG, EAL, "Hugepage filesystem at %s without 
%s option\n",
+   mntent.mnt_dir, OPTION);
+   hugepage_sz = default_size;
}
 
-   /* we have a specified --huge-dir option, only examine that dir 
*/
-   if (internal_conf->hugepage_dir != NULL &&
-   strcmp(splitstr[MOUNTPT], 
internal_conf->hugepage_dir) != 0)
-   continue;
+   if (cb(mntent.mnt_dir, hugepage_sz, cb_arg) != 0) {
+   stopped = true;
+   break;
+   }
+   }
 
-   if (strncmp(splitstr[FSTYPE], hugetlbfs_str, htlbfs_str_len) == 
0){
-   const char *pagesz_str = strstr(splitstr[OPTIONS], 
pagesize_opt);
+   if (ferror(f) || (!stopped && !feof(f))) {
+

[dpdk-dev] [PATCH v4 2/3] eal: add memory pre-allocation from existing files

2021-09-20 Thread dkozlyuk

From: Viacheslav Ovsiienko 

The primary DPDK process launch might take a long time if initially
allocated memory is large. From practice allocation of 1 TB of memory
over 1 GB hugepages on Linux takes tens of seconds. Fast restart
is highly desired for some applications and launch delay presents
a problem.

The primary delay happens in this call trace:
  rte_eal_init()
rte_eal_memory_init()
  rte_eal_hugepage_init()
eal_dynmem_hugepage_init()
  eal_memalloc_alloc_seg_bulk()
alloc_seg()
  mmap()

The largest part of the time spent in mmap() is filling the memory
with zeros. Kernel does so to prevent data leakage from a process
that was last using the page. However, in a controlled environment
it may not be the issue, while performance is. (Linux-specific
MAP_UNINITIALIZED flag allows mapping without clearing, but it is
disabled in all popular distributions for the reason above.)

It is proposed to add a new EAL option: --mem-file FILE1,FILE2,...
to map hugepages "as is" from specified FILEs in hugetlbfs.
Compared to using external memory for the task, EAL option requires
no change to application code, while allowing administrator
to control hugepage sizes and their NUMA affinity.

Limitations of the feature:

* Linux-specific (only Linux maps hugepages from files).
* Incompatible with --legacy-mem (partially replaces it).
* Incompatible with --single-file-segments
  (--mem-file FILEs can contain as many segments as needed).
* Incompatible with --in-memory (logically).

A warning about possible security implications is printed
when --mem-file is used.

Until this patch DPDK allocator always cleared memory on freeing,
so that it did not have to do that on allocation, while new memory
was cleared by the kernel. When --mem-file is in use, DPDK clears memory
after allocation in rte_zmalloc() and does not clean it on freeing.
Effectively user trades fast startup for occasional allocation slowdown
whenever it is absolutely necessary. When memory is recycled, it is
cleared again, which is suboptimal par se, but saves complication
of memory management.

Signed-off-by: Viacheslav Ovsiienko 
Signed-off-by: Dmitry Kozlyuk 
---
 doc/guides/linux_gsg/linux_eal_parameters.rst |  17 +
 lib/eal/common/eal_common_dynmem.c|   6 +
 lib/eal/common/eal_common_options.c   |  23 ++
 lib/eal/common/eal_internal_cfg.h |   4 +
 lib/eal/common/eal_memalloc.h |   8 +-
 lib/eal/common/eal_options.h  |   2 +
 lib/eal/common/malloc_elem.c  |   5 +
 lib/eal/common/malloc_heap.h  |   8 +
 lib/eal/common/rte_malloc.c   |  16 +-
 lib/eal/include/rte_memory.h  |   4 +-
 lib/eal/linux/eal.c   |  28 ++
 lib/eal/linux/eal_hugepage_info.c |   5 +
 lib/eal/linux/eal_memalloc.c  | 328 +-
 13 files changed, 441 insertions(+), 13 deletions(-)

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
b/doc/guides/linux_gsg/linux_eal_parameters.rst
index bd3977cb3d..b465feaea8 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -92,6 +92,23 @@ Memory-related options
 
 Free hugepages back to system exactly as they were originally allocated.
 
+*   ``--mem-file ``
+
+Use memory from pre-allocated files in ``hugetlbfs`` without clearing it;
+when this memory is exhausted, switch to default dynamic allocation.
+This speeds up startup compared to ``--legacy-mem`` while also avoiding
+later delays for allocating new hugepages. One downside is slowdown
+of all zeroed memory allocations. Security warning: an application
+can access contents left by previous users of hugepages. Multiple files
+can be pre-allocated in ``hugetlbfs`` with different page sizes,
+on desired NUMA nodes, using ``mount`` options and ``numactl``:
+
+--mem-file /mnt/huge-1G/node0,/mnt/huge-1G/node1,/mnt/huge-2M/extra
+
+This option is incompatible with ``--legacy-mem``, ``--in-memory``,
+and ``--single-file-segments``. Primary and secondary processes
+must specify exactly the same list of files.
+
 Other options
 ~
 
diff --git a/lib/eal/common/eal_common_dynmem.c 
b/lib/eal/common/eal_common_dynmem.c
index 7c5437ddfa..abcf22f097 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -272,6 +272,12 @@ eal_dynmem_hugepage_init(void)
internal_conf->num_hugepage_sizes) < 0)
return -1;
 
+#ifdef RTE_EXEC_ENV_LINUX
+   /* pre-allocate pages from --mem-file option files */
+   if (eal_memalloc_memfile_alloc(used_hp) < 0)
+   return -1;
+#endif
+
for (hp_sz_idx = 0;
hp_sz_idx < (int)internal_conf->num_hugepage_sizes;
hp_sz_idx++) {
diff --git a/lib/eal/common/eal_common_opt

[dpdk-dev] [PATCH v4 3/3] app/test: add allocator performance autotest

2021-09-20 Thread dkozlyuk

From: Dmitry Kozlyuk 

Memory allocator performance is crucial to applications that deal
with large amount of memory or allocate frequently. DPDK allocator
performance is affected by EAL options, API used and, at least,
allocation size. New autotest is intended to be run with different
EAL options. It measures performance with a range of sizes
for dirrerent APIs: rte_malloc, rte_zmalloc, and rte_memzone_reserve.

Work distribution between allocation and deallocation depends on EAL
options. The test prints both times and total time to ease comparison.

Memory can be filled with zeroes at different points of allocation path,
but it always takes considerable fraction of overall timing. This is why
the test measures filling speed and prints how long clearing would take
for each size as a hint.

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Viacheslav Ovsiienko 
---
 app/test/meson.build|   2 +
 app/test/test_malloc_perf.c | 157 
 2 files changed, 159 insertions(+)
 create mode 100644 app/test/test_malloc_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..a48dc79463 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -84,6 +84,7 @@ test_sources = files(
 'test_lpm6_perf.c',
 'test_lpm_perf.c',
 'test_malloc.c',
+'test_malloc_perf.c',
 'test_mbuf.c',
 'test_member.c',
 'test_member_perf.c',
@@ -281,6 +282,7 @@ fast_tests = [
 
 perf_test_names = [
 'ring_perf_autotest',
+'malloc_perf_autotest',
 'mempool_perf_autotest',
 'memcpy_perf_autotest',
 'hash_perf_autotest',
diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
new file mode 100644
index 00..4435894095
--- /dev/null
+++ b/app/test/test_malloc_perf.c
@@ -0,0 +1,157 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+typedef void * (alloc_t)(const char *name, size_t size, unsigned int align);
+typedef void (free_t)(void *addr);
+
+static const uint64_t KB = 1 << 10;
+static const uint64_t GB = 1 << 30;
+
+static double
+tsc_to_us(uint64_t tsc, size_t runs)
+{
+   return (double)tsc / rte_get_tsc_hz() * US_PER_S / runs;
+}
+
+static int
+test_memset_perf(double *us_per_gb)
+{
+   static const size_t RUNS = 20;
+
+   void *ptr;
+   size_t i;
+   uint64_t tsc;
+
+   puts("Performance: memset");
+
+   ptr = rte_malloc(NULL, GB, 0);
+   if (ptr == NULL) {
+   printf("rte_malloc(size=%"PRIx64") failed\n", GB);
+   return -1;
+   }
+
+   tsc = rte_rdtsc_precise();
+   for (i = 0; i < RUNS; i++)
+   memset(ptr, 0, GB);
+   tsc = rte_rdtsc_precise() - tsc;
+
+   *us_per_gb = tsc_to_us(tsc, RUNS);
+   printf("Result: %f.3 GiB/s <=> %.2f us/MiB\n",
+   US_PER_S / *us_per_gb, *us_per_gb / KB);
+
+   rte_free(ptr);
+   putchar('\n');
+   return 0;
+}
+
+static int
+test_alloc_perf(const char *name, alloc_t *alloc_fn, free_t free_fn,
+   size_t max_runs, double memset_gb_us)
+{
+   static const size_t SIZES[] = {
+   1 << 6, 1 << 7, 1 << 10, 1 << 12, 1 << 16, 1 << 20,
+   1 << 21, 1 << 22, 1 << 24, 1 << 30 };
+
+   size_t i, j;
+   void **ptrs;
+
+   printf("Performance: %s\n", name);
+
+   ptrs = calloc(max_runs, sizeof(ptrs[0]));
+   if (ptrs == NULL) {
+   puts("Cannot allocate memory for pointers");
+   return -1;
+   }
+
+   printf("%12s%8s%12s%12s%12s%12s\n",
+   "Size (B)", "Runs", "Alloc (us)", "Free (us)",
+   "Total (us)", "memset (us)");
+   for (i = 0; i < RTE_DIM(SIZES); i++) {
+   size_t size = SIZES[i];
+   size_t runs_done;
+   uint64_t tsc_start, tsc_alloc, tsc_free;
+   double alloc_time, free_time, memset_time;
+
+   tsc_start = rte_rdtsc_precise();
+   for (j = 0; j < max_runs; j++) {
+   ptrs[j] = alloc_fn(NULL, size, 0);
+   if (ptrs[j] == NULL)
+   break;
+   }
+   tsc_alloc = rte_rdtsc_precise() - tsc_start;
+
+   if (j == 0) {
+   printf("%12zu Interrupted: out of memory.\n", size);
+   break;
+   }
+   runs_done = j;
+
+   tsc_start = rte_rdtsc_precise();
+   for (j = 0; j < runs_done && ptrs[j] != NULL; j++)
+   free_fn(ptrs[j]);
+   tsc_free = rte_rdtsc_precise() - tsc_start;
+
+   alloc_time = tsc_to_us(tsc_alloc, runs_done);
+   free_time = tsc_to_us(tsc_free, runs_done);
+   memset_time = memset_gb_us * size / GB;
+   printf("%12zu%8zu%12.2f%12.2f%12.2f%12.2f\n",
+

Re: [dpdk-dev] [PATCH] test/compress: fix buffer overflow bug

2021-09-20 Thread Zhang, Roy Fan

> -Original Message-
> From: Troy, Rebecca 
> Sent: Friday, September 17, 2021 4:12 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; Troy, Rebecca
> ; Trahe, Fiona ; Trybula,
> ArturX ; sta...@dpdk.org; Ashish Gupta
> 
> Subject: [PATCH] test/compress: fix buffer overflow bug
> 
> Fixes stack buffer overflow bug in compressdev autotest, which
> was caused by the use of buf_idx in the debug logs. Originally, buf_idx
> was treated as an array instead of the reference of an integar.
> This was fixed by replacing the use of buf_idx[priv_data->orig_idx] with
> the variable i.
> 
> Fixes: 466a2c4bb5f4 ("test/compress: improve debug logs")
> Fixes: 6bbc5a923625 ("test/compress: refactor unit tests")
> 
> Cc: fiona.tr...@intel.com
> Cc: arturx.tryb...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Rebecca Troy 
> ---

Acked-by: Fan Zhang

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-20 Thread Loftus, Ciara




> -Original Message-
> From: dev  On Behalf Of Stephen Hemminger
> Sent: Friday 3 September 2021 17:15
> To: dev@dpdk.org
> Cc: Stephen Hemminger ;
> sta...@dpdk.org; xiaolong...@intel.com
> Subject: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process
> 
> Doing basic operations like info_get or get_stats was broken
> in af_xdp PMD. The info_get would crash because dev->device
> was NULL in secondary process. Fix this by doing same initialization
> as af_packet and tap devices.
> 
> The get_stats would crash because the XDP socket is not open in
> primary process. As a workaround don't query kernel for dropped
> packets when called from secondary process.
> 
> Note: this does not address the other bug which is that transmitting
> in secondary process is broken because the send() in tx_kick
> will fail because XDP socket fd is not valid in secondary process.

Hi Stephen,

Apologies for the delayed reply, I was on vacation.

In the Bugzilla report you suggest we:
"mark AF_XDP as broken in with primary/secondary
and return an error in probe in secondary process".
I agree with this suggestion. However with this patch we still permit 
secondary, and just make sure it doesn't crash for get_stats. Did you change 
your mind?
Personally, I would prefer to have primary/secondary either working 100% or 
else not allowed at all by throwing an error during probe. What do you think? 
Do you have a reason/use case to permit secondary processes despite some 
features not being available eg. full stats, tx?

Thanks,
Ciara

> 
> Bugzilla ID: 805
> Fixes: f1debd77efaf ("net/af_xdp: introduce AF_XDP PMD")
> Cc: sta...@dpdk.org
> Cc: xiaolong...@intel.com
> Ciara Loftus 
> Qi Zhang 
> Anatoly Burakov 
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 74ffa4511284..70abc14fa753 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -860,7 +860,7 @@ eth_stats_get(struct rte_eth_dev *dev, struct
> rte_eth_stats *stats)
>   struct pkt_rx_queue *rxq;
>   struct pkt_tx_queue *txq;
>   socklen_t optlen;
> - int i, ret;
> + int i;
> 
>   for (i = 0; i < dev->data->nb_rx_queues; i++) {
>   optlen = sizeof(struct xdp_statistics);
> @@ -876,13 +876,12 @@ eth_stats_get(struct rte_eth_dev *dev, struct
> rte_eth_stats *stats)
>   stats->ibytes += stats->q_ibytes[i];
>   stats->imissed += rxq->stats.rx_dropped;
>   stats->oerrors += txq->stats.tx_dropped;
> - ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> - XDP_STATISTICS, &xdp_stats, &optlen);
> - if (ret != 0) {
> - AF_XDP_LOG(ERR, "getsockopt() failed for
> XDP_STATISTICS.\n");
> - return -1;
> - }
> - stats->imissed += xdp_stats.rx_dropped;
> +
> + /* The socket fd is not valid in secondary process */
> + if (rte_eal_process_type() != RTE_PROC_SECONDARY &&
> + getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> +XDP_STATISTICS, &xdp_stats, &optlen) == 0)
> + stats->imissed += xdp_stats.rx_dropped;
> 
>   stats->opackets += stats->q_opackets[i];
>   stats->obytes += stats->q_obytes[i];
> @@ -1799,7 +1798,9 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device
> *dev)
>   AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
>   return -EINVAL;
>   }
> + /* TODO: reconnect socket from primary */
>   eth_dev->dev_ops = &ops;
> + eth_dev->device = &dev->device;
>   rte_eth_dev_probing_finish(eth_dev);
>   return 0;
>   }
> --
> 2.30.2

Re: [dpdk-dev] [PATCH v4 02/11] dma/ioat: create dmadev instances on PCI probe

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:18PM +, Conor Walsh wrote:
> When a suitable device is found during the PCI probe, create a dmadev
> instance for each channel. Internal structures and HW definitions required
> for device creation are also included.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
>  drivers/dma/ioat/ioat_dmadev.c   | 119 ++-
>  drivers/dma/ioat/ioat_hw_defs.h  |  45 
>  drivers/dma/ioat/ioat_internal.h |  24 +++
>  3 files changed, 186 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
> index f3491d45b1..b815d30bcf 100644
> --- a/drivers/dma/ioat/ioat_dmadev.c
> +++ b/drivers/dma/ioat/ioat_dmadev.c
> @@ -4,6 +4,7 @@
>  



> +/* Destroy a DMA device. */
> +static int
> +ioat_dmadev_destroy(const char *name)
> +{
> + struct rte_dma_dev *dev;
> + struct ioat_dmadev *ioat;
> + int ret;
> +
> + if (!name) {
> + IOAT_PMD_ERR("Invalid device name");
> + return -EINVAL;
> + }
> +
> + dev = &rte_dma_devices[rte_dma_get_dev_id(name)];
> + if (!dev) {
> + IOAT_PMD_ERR("Invalid device name (%s)", name);
> + return -EINVAL;
> + }
> +

I think you need to independently check the return value from
rte_dma_get_dev_id, rather than assuming when it returns an error value the
resultant index location will hold a null pointer.

> + ioat = dev->dev_private;
> + if (!ioat) {
> + IOAT_PMD_ERR("Error getting dev_private");
> + return -EINVAL;
> + }
> +
> + dev->dev_private = NULL;
> + rte_free(ioat->desc_ring);
> +
> + ret = rte_dma_pmd_release(name);

The rte_dma_pmd_allocate function reserves memory for the private data, so
the release function should free that memory too. However, you have
assigned private_data to NULL just above, so that probably won't work.

> + if (ret)
> + IOAT_PMD_DEBUG("Device cleanup failed");
> +
> + return 0;
> +}
> +

Re: [dpdk-dev] [PATCH v4 06/11] dma/ioat: add data path job submission functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:22PM +, Conor Walsh wrote:
> Add data path functions for enqueuing and submitting operations to
> IOAT devices.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
>  doc/guides/dmadevs/ioat.rst| 54 
>  drivers/dma/ioat/ioat_dmadev.c | 92 ++
>  2 files changed, 146 insertions(+)
> 
> diff --git a/doc/guides/dmadevs/ioat.rst b/doc/guides/dmadevs/ioat.rst
> index a64d67bf89..2464207e20 100644
> --- a/doc/guides/dmadevs/ioat.rst
> +++ b/doc/guides/dmadevs/ioat.rst
> @@ -89,3 +89,57 @@ The following code shows how the device is configured in 
> ``test_dmadev.c``:
>  
>  Once configured, the device can then be made ready for use by calling the
>  ``rte_dma_start()`` API.
> +
> +Performing Data Copies
> +~~~
> +
> +To perform data copies using IOAT dmadev devices, the functions
> +``rte_dma_copy()`` and ``rte_dma_submit()`` should be used. Alternatively
> +``rte_dma_copy()`` can be called with the ``RTE_DMA_OP_FLAG_SUBMIT`` flag
> +set.
> +
> +The ``rte_dma_copy()`` function enqueues a single copy to the
> +device ring for copying at a later point. The parameters to the function
> +include the device ID of the desired device, the virtual DMA channel required
> +(always 0 for IOAT), the IOVA addresses of both the source and destination
> +buffers, the length of the data to be copied and any operation flags. The
> +function will return the index of the enqueued job which can be use to
> +track that operation.
> +
> +While the ``rte_dma_copy()`` function enqueues a copy operation on the device
> +ring, the copy will not actually be performed until after the application 
> calls
> +the ``rte_dma_submit()`` function. This function informs the device hardware
> +of the elements enqueued on the ring, and the device will begin to process 
> them.
> +It is expected that, for efficiency reasons, a burst of operations will be
> +enqueued to the device via multiple enqueue calls between calls to the
> +``rte_dma_submit()`` function. If desired you can pass the
> +``RTE_DMA_OP_FLAG_SUBMIT`` flag when calling ``rte_dma_copy()`` and this will
> +tell the device to perform the enqueued operation and any unperformed 
> operations
> +before it. The ``RTE_DMA_OP_FLAG_SUBMIT`` flag can be passed instead of 
> calling
> +the ``rte_dma_submit()`` function for example on the last enqueue of the 
> burst.
> +
> +The following code from demonstrates how to enqueue a burst of copies to the
> +device and start the hardware processing of them:
> +
> +.. code-block:: C
> +
> +   for (i = 0; i < BURST_SIZE; i++) {
> +  if (rte_dma_copy(dev_id, vchan, rte_mbuf_data_iova(srcs[i]),
> +rte_mbuf_data_iova(dsts[i]), COPY_LEN, 0) < 0) {
> + PRINT_ERR("Error with rte_dma_copy for buffer %u\n", i);
> + return -1;
> +  }
> +   }
> +   if (rte_dma_submit(dev_id, vchan) < 0) {
> +  PRINT_ERR("Error with performing operations\n", i);
> +  return -1;
> +   }
> +
> +Filling an Area of Memory
> +~~
> +
> +The driver also has support for the ``fill`` operation, where an area
> +of memory is overwritten, or filled, with a short pattern of data.
> +Fill operations can be performed in much the same was as copy operations
> +described above, just using the ``rte_dma_fill()`` function rather
> +than the ``rte_dma_copy()`` function.

Similar to the feedback on the idxd driver, I think we need to see how much
of this text is already present in the generic dmadev documentation and
re-use or reference that. If it's not present, then these patches should
add it to the common doc, not a separate driver-specific doc.

/Bruce

Re: [dpdk-dev] [PATCH v4 07/11] dma/ioat: add data path completion functions

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:23PM +, Conor Walsh wrote:
> Add the data path functions for gathering completed operations
> from IOAT devices.
> 
> Signed-off-by: Conor Walsh 
> Signed-off-by: Kevin Laatz 
> ---

For the code part:

Acked-by: Bruce Richardson 

However, the docs need to be made common with other drivers.

Re: [dpdk-dev] [PATCH v2 2/4] net/iavf: add iAVF IPsec inline crypto support

2021-09-20 Thread Nicolau, Radu


Hi Jingjing, thanks for reviewing!


On 9/18/2021 6:28 AM, Wu, Jingjing wrote:

In general, the patch is too big to review. Patch split would help a lot!

I will do my best to split in in the next revision.


[...]

+static const struct rte_cryptodev_symmetric_capability *
+get_capability(struct iavf_security_ctx *iavf_sctx,
+   uint32_t algo, uint32_t type)
+{
+   const struct rte_cryptodev_capabilities *capability;
+   int i = 0;
+
+   capability = &iavf_sctx->crypto_capabilities[i];
+
+   while (capability->op != RTE_CRYPTO_OP_TYPE_UNDEFINED) {
+   if (capability->op == RTE_CRYPTO_OP_TYPE_SYMMETRIC &&
+   capability->sym.xform_type == type &&
+   capability->sym.cipher.algo == algo)
+   return &capability->sym;
+   /** try next capability */
+   capability = &iavf_crypto_capabilities[i++];

Better to  check i to avoid out of boundary.
The condition in the while statement plus the last element in the array 
set as RTE_CRYPTO_OP_TYPE_UNDEFINED prevents the loop from going out of 
bounds.

[...]


+
+static int
+valid_length(uint32_t len, uint32_t min, uint32_t max, uint32_t increment)
+{
+   if (len < min || len > max)
+   return 0;
+
+   if (increment == 0)
+   return 1;
+
+   if ((len - min) % increment)
+   return 0;
+
+   return 1;
+}

Would it be better to use true/false instead of 1/0? And the same to following 
valid functions.

Will do.

[...]


+static int
+iavf_ipsec_crypto_session_validate_conf(struct iavf_security_ctx *iavf_sctx,
+   struct rte_security_session_conf *conf)
+{
+   /** validate security action/protocol selection */
+   if (conf->action_type != RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
+   conf->protocol != RTE_SECURITY_PROTOCOL_IPSEC) {
+   PMD_DRV_LOG(ERR, "Unsupported action / protocol specified");
+   return -EINVAL;
+   }
+
+   /** validate IPsec protocol selection */
+   if (conf->ipsec.proto != RTE_SECURITY_IPSEC_SA_PROTO_ESP) {
+   PMD_DRV_LOG(ERR, "Unsupported IPsec protocol specified");
+   return -EINVAL;
+   }
+
+   /** validate selected options */
+   if (conf->ipsec.options.copy_dscp ||
+   conf->ipsec.options.copy_flabel ||
+   conf->ipsec.options.copy_df ||
+   conf->ipsec.options.dec_ttl ||
+   conf->ipsec.options.ecn ||
+   conf->ipsec.options.stats) {
+   PMD_DRV_LOG(ERR, "Unsupported IPsec option specified");
+   return -EINVAL;
+   }
+
+   /**
+* Validate crypto xforms parameters.
+*
+* AEAD transforms can be used for either inbound/outbound IPsec SAs,
+* for non-AEAD crypto transforms we explicitly only support CIPHER/AUTH
+* for outbound and AUTH/CIPHER chained transforms for inbound IPsec.
+*/
+   if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+   if (!valid_aead_xform(iavf_sctx, &conf->crypto_xform->aead)) {
+   PMD_DRV_LOG(ERR, "Unsupported IPsec option specified");
+   return -EINVAL;
+   }

Invalid parameter, but not unsupported option, right? Same to below.

I reworked the messages to be consistent

[...]


+static void
+sa_add_set_aead_params(struct virtchnl_ipsec_crypto_cfg_item *cfg,
+   struct rte_crypto_aead_xform *aead, uint32_t salt)
+{
+   cfg->crypto_type = VIRTCHNL_AEAD;
+
+   switch (aead->algo) {
+   case RTE_CRYPTO_AEAD_AES_CCM:
+   cfg->algo_type = VIRTCHNL_AES_CCM; break;
+   case RTE_CRYPTO_AEAD_AES_GCM:
+   cfg->algo_type = VIRTCHNL_AES_GCM; break;
+   case RTE_CRYPTO_AEAD_CHACHA20_POLY1305:
+   cfg->algo_type = VIRTCHNL_CHACHA20_POLY1305; break;
+   default:
+   RTE_ASSERT("we should be here");

Assert just because invalid config? Similar comments to other valid functions.

Removed



+   }
+
+   cfg->key_len = aead->key.length;
+   cfg->iv_len = aead->iv.length;
+   cfg->digest_len = aead->digest_length;
+   cfg->salt = salt;
+
+   RTE_ASSERT(sizeof(cfg->key_data) < cfg->key_len);
+

Not only data, but length, better to valid before setting? The same to other 
kind params setting.
The length here is checked to fit into the array, it can still be valid; 
I moved this check in the valid_length function that can actually return 
an error.

[...]



+static inline void
+iavf_build_data_desc_cmd_offset_fields(volatile uint64_t *qw1,
+   struct rte_mbuf *m)
+{
+   uint64_t command = 0;
+   uint64_t offset = 0;
+   uint64_t l2tag1 = 0;
+
+   *qw1 = IAVF_TX_DESC_DTYPE_DATA;
+
+   command = (uint64_t)IAVF_TX_DESC_CMD_ICRC;
+
+   /* Descriptor based VLAN insertion */
+   if (m->ol_flags & PKT_TX_VLAN_PKT) {
+   comm

Re: [dpdk-dev] [PATCH v4 08/11] dma/ioat: add statistics

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:24PM +, Conor Walsh wrote:
> Add statistic tracking for operations in IOAT.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
>  doc/guides/dmadevs/ioat.rst| 23 ++
>  drivers/dma/ioat/ioat_dmadev.c | 43 ++
>  2 files changed, 66 insertions(+)
> 
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v4 09/11] dma/ioat: add support for vchan status function

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:25PM +, Conor Walsh wrote:
> Add support for the rte_dmadev_vchan_status API call.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v4 10/11] dma/ioat: add burst capacity function

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:26PM +, Conor Walsh wrote:
> Adds the ability to find the remaining space in the IOAT ring.
> 
> Signed-off-by: Conor Walsh 
> Signed-off-by: Kevin Laatz 
> ---
Acked-by: Bruce Richardson

Re: [dpdk-dev] [PATCH v4 11/11] devbind: move ioat device ID for ICX to dmadev category

2021-09-20 Thread Bruce Richardson

On Fri, Sep 17, 2021 at 03:42:27PM +, Conor Walsh wrote:
> Move Intel IOAT devices on Ice Lake systems from Misc to DMA devices.
> 
> Signed-off-by: Conor Walsh 
> Reviewed-by: Kevin Laatz 
> ---
>  usertools/dpdk-devbind.py | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
> index 98b698ccc0..afebc8cb62 100755
> --- a/usertools/dpdk-devbind.py
> +++ b/usertools/dpdk-devbind.py
> @@ -69,14 +69,13 @@
>  network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
>  baseband_devices = [acceleration_class]
>  crypto_devices = [encryption_class, intel_processor_class]
> -dma_devices = [intel_idxd_spr]
> +dma_devices = [intel_idxd_spr, intel_ioat_icx]
>  eventdev_devices = [cavium_sso, cavium_tim, intel_dlb, octeontx2_sso]
>  mempool_devices = [cavium_fpa, octeontx2_npa]
>  compress_devices = [cavium_zip]
>  regex_devices = [octeontx2_ree]
>  misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx,
> -intel_ioat_icx, intel_ntb_skx, intel_ntb_icx,
> -octeontx2_dma]
> +intel_ntb_skx, intel_ntb_icx, octeontx2_dma]
>
I think the ioat_bdw and ioat_skx elements should also go down as DMA
devices.

With that change:

Reviewed-by: Bruce Richardson

[dpdk-dev] [PATCH v3 0/6] iavf: add iAVF IPsec inline crypto support

2021-09-20 Thread Radu Nicolau

Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and inconjunction with TSO for UDP and TCP flows.

Radu Nicolau (6):
  common/iavf: add iAVF IPsec inline crypto support
  net/iavf: rework tx path
  net/iavf: add support for asynchronous virt channel messages
  net/iavf: add iAVF IPsec inline crypto support
  net/iavf: add xstats support for inline IPsec crypto
  net/iavf: add watchdog for VFLR

 drivers/common/iavf/iavf_type.h   |  215 +-
 drivers/common/iavf/virtchnl.h|   17 +-
 drivers/common/iavf/virtchnl_inline_ipsec.h   |  553 +
 drivers/net/iavf/iavf.h   |   53 +-
 drivers/net/iavf/iavf_ethdev.c|  222 +-
 drivers/net/iavf/iavf_generic_flow.c  |   16 +
 drivers/net/iavf/iavf_generic_flow.h  |2 +
 drivers/net/iavf/iavf_ipsec_crypto.c  | 1918 +
 drivers/net/iavf/iavf_ipsec_crypto.h  |   96 +
 .../net/iavf/iavf_ipsec_crypto_capabilities.h |  383 
 drivers/net/iavf/iavf_rxtx.c  |  709 --
 drivers/net/iavf/iavf_rxtx.h  |   91 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c  |   10 +-
 drivers/net/iavf/iavf_vchnl.c |  166 +-
 drivers/net/iavf/meson.build  |3 +-
 drivers/net/iavf/rte_pmd_iavf.h   |1 +
 drivers/net/iavf/version.map  |3 +
 17 files changed, 4137 insertions(+), 321 deletions(-)
 create mode 100644 drivers/common/iavf/virtchnl_inline_ipsec.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.c
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto_capabilities.h

-- 
v2: small updates and fixes in the flow related section
v3: split the huge patch and address feedback

2.25.1

[dpdk-dev] [PATCH v3 1/6] common/iavf: add iAVF IPsec inline crypto support

2021-09-20 Thread Radu Nicolau

Add support for inline crypto for IPsec.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
---
 drivers/common/iavf/iavf_type.h | 215 +++-
 drivers/common/iavf/virtchnl.h  |  17 +-
 drivers/common/iavf/virtchnl_inline_ipsec.h | 553 
 3 files changed, 775 insertions(+), 10 deletions(-)
 create mode 100644 drivers/common/iavf/virtchnl_inline_ipsec.h

diff --git a/drivers/common/iavf/iavf_type.h b/drivers/common/iavf/iavf_type.h
index 73dfb47e70..1f8f8ae5fd 100644
--- a/drivers/common/iavf/iavf_type.h
+++ b/drivers/common/iavf/iavf_type.h
@@ -709,11 +709,29 @@ enum iavf_rx_prog_status_desc_error_bits {
 #define IAVF_FOUR_BIT_MASK 0xF
 #define IAVF_EIGHTEEN_BIT_MASK 0x3
 
-/* TX Descriptor */
+/* TX Data Descriptor */
 struct iavf_tx_desc {
-   __le64 buffer_addr; /* Address of descriptor's data buf */
-   __le64 cmd_type_offset_bsz;
-};
+   union {
+   struct {
+   __le64 buffer_addr; /* Addr of descriptor's data buf */
+   __le64 cmd_type_offset_bsz;
+   };
+   struct {
+   __le64 qw0; /**< data buffer address */
+   __le64 qw1; /**< dtyp, cmd, offset, buf_sz and l2tag1 */
+   };
+   struct {
+   __le64 buffer_addr; /**< Data buffer address */
+   __le64 type:4;  /**< Descriptor type */
+   __le64 cmd:12;  /**< Command field */
+   __le64 offset_l2len:7;  /**< L2 header length */
+   __le64 offset_l3len:7;  /**< L3 header length */
+   __le64 offset_l4len:4;  /**< L4 header length */
+   __le64 buffer_sz:14;/**< Data buffer size */
+   __le64 l2tag1:16;   /**< L2 Tag 1 value */
+   } debug __rte_packed;
+   };
+} __rte_packed;
 
 #define IAVF_TXD_QW1_DTYPE_SHIFT   0
 #define IAVF_TXD_QW1_DTYPE_MASK(0xFUL << 
IAVF_TXD_QW1_DTYPE_SHIFT)
@@ -723,6 +741,7 @@ enum iavf_tx_desc_dtype_value {
IAVF_TX_DESC_DTYPE_NOP  = 0x1, /* same as Context desc */
IAVF_TX_DESC_DTYPE_CONTEXT  = 0x1,
IAVF_TX_DESC_DTYPE_FCOE_CTX = 0x2,
+   IAVF_TX_DESC_DTYPE_IPSEC= 0x3,
IAVF_TX_DESC_DTYPE_FILTER_PROG  = 0x8,
IAVF_TX_DESC_DTYPE_DDP_CTX  = 0x9,
IAVF_TX_DESC_DTYPE_FLEX_DATA= 0xB,
@@ -734,7 +753,7 @@ enum iavf_tx_desc_dtype_value {
 #define IAVF_TXD_QW1_CMD_SHIFT 4
 #define IAVF_TXD_QW1_CMD_MASK  (0x3FFUL << IAVF_TXD_QW1_CMD_SHIFT)
 
-enum iavf_tx_desc_cmd_bits {
+enum iavf_tx_data_desc_cmd_bits {
IAVF_TX_DESC_CMD_EOP= 0x0001,
IAVF_TX_DESC_CMD_RS = 0x0002,
IAVF_TX_DESC_CMD_ICRC   = 0x0004,
@@ -778,18 +797,79 @@ enum iavf_tx_desc_length_fields {
 #define IAVF_TXD_QW1_L2TAG1_SHIFT  48
 #define IAVF_TXD_QW1_L2TAG1_MASK   (0xULL << IAVF_TXD_QW1_L2TAG1_SHIFT)
 
+#define IAVF_TXD_DATA_QW1_DTYPE_SHIFT  (0)
+#define IAVF_TXD_DATA_QW1_DTYPE_MASK   (0xFUL << IAVF_TXD_QW1_DTYPE_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_CMD_SHIFT(4)
+#define IAVF_TXD_DATA_QW1_CMD_MASK (0x3FFUL << IAVF_TXD_DATA_QW1_CMD_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_OFFSET_SHIFT (16)
+#define IAVF_TXD_DATA_QW1_OFFSET_MASK  (0x3ULL << \
+   IAVF_TXD_DATA_QW1_OFFSET_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_OFFSET_MACLEN_SHIFT  (IAVF_TXD_DATA_QW1_OFFSET_SHIFT)
+#define IAVF_TXD_DATA_QW1_OFFSET_MACLEN_MASK   \
+   (0x7FUL << IAVF_TXD_DATA_QW1_OFFSET_MACLEN_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_OFFSET_IPLEN_SHIFT   \
+   (IAVF_TXD_DATA_QW1_OFFSET_SHIFT + IAVF_TX_DESC_LENGTH_IPLEN_SHIFT)
+#define IAVF_TXD_DATA_QW1_OFFSET_IPLEN_MASK\
+   (0x7FUL << IAVF_TXD_DATA_QW1_OFFSET_IPLEN_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_OFFSET_L4LEN_SHIFT   \
+   (IAVF_TXD_DATA_QW1_OFFSET_SHIFT + IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT)
+#define IAVF_TXD_DATA_QW1_OFFSET_L4LEN_MASK\
+   (0xFUL << IAVF_TXD_DATA_QW1_OFFSET_L4LEN_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_MACLEN_MASK  \
+   (0x7FUL << IAVF_TX_DESC_LENGTH_MACLEN_SHIFT)
+#define IAVF_TXD_DATA_QW1_IPLEN_MASK   \
+   (0x7FUL << IAVF_TX_DESC_LENGTH_IPLEN_SHIFT)
+#define IAVF_TXD_DATA_QW1_L4LEN_MASK   \
+   (0xFUL << IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT)
+#define IAVF_TXD_DATA_QW1_FCLEN_MASK   \
+   (0xFUL << IAVF_TX_DESC_LENGTH_L4_FC_LEN_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT  (34)
+#define IAVF_TXD_DATA_QW1_TX_BUF_SZ_MASK   \
+   (0x3FFFULL << IAVF_TXD_DATA_QW1_TX_BUF_SZ_SHIFT)
+
+#define IAVF_TXD_DATA_QW1_L2TAG1_SHIFT (48)
+#define IAVF_TXD_DATA_QW1_L2TAG1_MASK  \
+   (0xULL << IAVF_TXD_DATA_QW1_L2TAG1_SHIFT)
+
 /* Context descriptors */
 struct iavf_tx_context_desc {
+   union {
+

[dpdk-dev] [PATCH v3 2/6] net/iavf: rework tx path

2021-09-20 Thread Radu Nicolau

Rework the TX path and TX descriptor usage in order to
allow for better use of oflload flags and to facilitate enabling of
inline crypto offload feature.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf_rxtx.c | 536 +++
 drivers/net/iavf/iavf_rxtx.h |   9 +-
 drivers/net/iavf/iavf_rxtx_vec_sse.c |  10 +-
 3 files changed, 319 insertions(+), 236 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 6de8ad3fe3..a84a0b07f6 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -1048,27 +1048,31 @@ iavf_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile 
union iavf_rx_desc *rxdp)
 
 static inline void
 iavf_flex_rxd_to_vlan_tci(struct rte_mbuf *mb,
- volatile union iavf_rx_flex_desc *rxdp,
- uint8_t rx_flags)
+ volatile union iavf_rx_flex_desc *rxdp)
 {
-   uint16_t vlan_tci = 0;
-
-   if (rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG1 &&
-   rte_le_to_cpu_64(rxdp->wb.status_error0) &
-   (1 << IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_S))
-   vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag1);
+   if (rte_le_to_cpu_64(rxdp->wb.status_error0) &
+   (1 << IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_S)) {
+   mb->ol_flags |= PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED;
+   mb->vlan_tci =
+   rte_le_to_cpu_16(rxdp->wb.l2tag1);
+   } else {
+   mb->vlan_tci = 0;
+   }
 
 #ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
-   if (rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2 &&
-   rte_le_to_cpu_16(rxdp->wb.status_error1) &
-   (1 << IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_S))
-   vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd);
-#endif
-
-   if (vlan_tci) {
-   mb->ol_flags |= PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED;
-   mb->vlan_tci = vlan_tci;
+   if (rte_le_to_cpu_16(rxdp->wb.status_error1) &
+   (1 << IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_S)) {
+   mb->ol_flags |= PKT_RX_QINQ_STRIPPED | PKT_RX_QINQ |
+   PKT_RX_VLAN_STRIPPED | PKT_RX_VLAN;
+   mb->vlan_tci_outer = mb->vlan_tci;
+   mb->vlan_tci = rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd);
+   PMD_RX_LOG(DEBUG, "Descriptor l2tag2_1: %u, l2tag2_2: %u",
+  rte_le_to_cpu_16(rxdp->wb.l2tag2_1st),
+  rte_le_to_cpu_16(rxdp->wb.l2tag2_2nd));
+   } else {
+   mb->vlan_tci_outer = 0;
}
+#endif
 }
 
 /* Translate the rx descriptor status and error fields to pkt flags */
@@ -1388,7 +1392,7 @@ iavf_recv_pkts_flex_rxd(void *rx_queue,
rxm->ol_flags = 0;
rxm->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(rxm, &rxd, rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(rxm, &rxd);
rxq->rxd_to_pkt_fields(rxq, rxm, &rxd);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(rx_stat_err0);
rxm->ol_flags |= pkt_flags;
@@ -1530,7 +1534,7 @@ iavf_recv_scattered_pkts_flex_rxd(void *rx_queue, struct 
rte_mbuf **rx_pkts,
first_seg->ol_flags = 0;
first_seg->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxd.wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(first_seg, &rxd, rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(first_seg, &rxd);
rxq->rxd_to_pkt_fields(rxq, first_seg, &rxd);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(rx_stat_err0);
 
@@ -1768,7 +1772,7 @@ iavf_rx_scan_hw_ring_flex_rxd(struct iavf_rx_queue *rxq)
 
mb->packet_type = ptype_tbl[IAVF_RX_FLEX_DESC_PTYPE_M &
rte_le_to_cpu_16(rxdp[j].wb.ptype_flex_flags0)];
-   iavf_flex_rxd_to_vlan_tci(mb, &rxdp[j], rxq->rx_flags);
+   iavf_flex_rxd_to_vlan_tci(mb, &rxdp[j]);
rxq->rxd_to_pkt_fields(rxq, mb, &rxdp[j]);
stat_err0 = rte_le_to_cpu_16(rxdp[j].wb.status_error0);
pkt_flags = iavf_flex_rxd_error_to_pkt_flags(stat_err0);
@@ -2038,7 +2042,7 @@ iavf_xmit_cleanup(struct iavf_tx_queue *txq)
desc_to_clean_to = (uint16_t)(desc_to_clean_to - nb_tx_desc);
 
desc_to_clean_to = sw_ring[desc_to_clean_to].last_id;
-   if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
+   if ((txd[desc_to_clean_to].qw1 &
rte_cpu_to_le_64(IAVF_TXD_QW1_DTYPE_MASK)) !=
rte_cpu_to_le_64(IAVF_TX_DESC_DTYPE_DESC_DONE)) {
PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
@@ -205

[dpdk-dev] [PATCH v3 3/6] net/iavf: add support for asynchronous virt channel messages

2021-09-20 Thread Radu Nicolau

Add support for asynchronous virtual channel messages, specifically for
inline IPsec messages.

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf.h   |  16 
 drivers/net/iavf/iavf_vchnl.c | 137 +-
 2 files changed, 101 insertions(+), 52 deletions(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index b3bd078111..8c7f7c0bed 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -189,6 +189,7 @@ struct iavf_info {
uint64_t supported_rxdid;
uint8_t *proto_xtr; /* proto xtr type for all queues */
volatile enum virtchnl_ops pend_cmd; /* pending command not finished */
+   rte_atomic32_t pend_cmd_count;
int cmd_retval; /* return value of the cmd response from PF */
uint8_t *aq_resp; /* buffer to store the adminq response from PF */
 
@@ -340,9 +341,24 @@ _atomic_set_cmd(struct iavf_info *vf, enum virtchnl_ops 
ops)
if (!ret)
PMD_DRV_LOG(ERR, "There is incomplete cmd %d", vf->pend_cmd);
 
+   rte_atomic32_set(&vf->pend_cmd_count, 1);
+
return !ret;
 }
 
+/* Check there is pending cmd in execution. If none, set new command. */
+static inline int
+_atomic_set_async_response_cmd(struct iavf_info *vf, enum virtchnl_ops ops)
+{
+   int ret = rte_atomic32_cmpset(&vf->pend_cmd, VIRTCHNL_OP_UNKNOWN, ops);
+
+   if (!ret)
+   PMD_DRV_LOG(ERR, "There is incomplete cmd %d", vf->pend_cmd);
+
+   rte_atomic32_set(&vf->pend_cmd_count, 2);
+
+   return !ret;
+}
 int iavf_check_api_version(struct iavf_adapter *adapter);
 int iavf_get_vf_resource(struct iavf_adapter *adapter);
 void iavf_handle_virtchnl_msg(struct rte_eth_dev *dev);
diff --git a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c
index 7f86050df3..5c62443999 100644
--- a/drivers/net/iavf/iavf_vchnl.c
+++ b/drivers/net/iavf/iavf_vchnl.c
@@ -23,8 +23,8 @@
 #include "iavf.h"
 #include "iavf_rxtx.h"
 
-#define MAX_TRY_TIMES 200
-#define ASQ_DELAY_MS  10
+#define MAX_TRY_TIMES 2000
+#define ASQ_DELAY_MS  1
 
 static uint32_t
 iavf_convert_link_speed(enum virtchnl_link_speed virt_link_speed)
@@ -143,7 +143,8 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, 
uint16_t buf_len,
 }
 
 static int
-iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct iavf_cmd_info *args)
+iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct iavf_cmd_info *args,
+   int async)
 {
struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(adapter);
struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter);
@@ -155,8 +156,14 @@ iavf_execute_vf_cmd(struct iavf_adapter *adapter, struct 
iavf_cmd_info *args)
if (vf->vf_reset)
return -EIO;
 
-   if (_atomic_set_cmd(vf, args->ops))
-   return -1;
+
+   if (async) {
+   if (_atomic_set_async_response_cmd(vf, args->ops))
+   return -1;
+   } else {
+   if (_atomic_set_cmd(vf, args->ops))
+   return -1;
+   }
 
ret = iavf_aq_send_msg_to_pf(hw, args->ops, IAVF_SUCCESS,
args->in_args, args->in_args_size, NULL);
@@ -252,9 +259,11 @@ static void
 iavf_handle_pf_event_msg(struct rte_eth_dev *dev, uint8_t *msg,
uint16_t msglen)
 {
+   struct iavf_adapter *adapter =
+   IAVF_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct iavf_info *vf = &adapter->vf;
struct virtchnl_pf_event *pf_msg =
(struct virtchnl_pf_event *)msg;
-   struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 
if (msglen < sizeof(struct virtchnl_pf_event)) {
PMD_DRV_LOG(DEBUG, "Error event");
@@ -330,18 +339,40 @@ iavf_handle_virtchnl_msg(struct rte_eth_dev *dev)
case iavf_aqc_opc_send_msg_to_vf:
if (msg_opc == VIRTCHNL_OP_EVENT) {
iavf_handle_pf_event_msg(dev, info.msg_buf,
-   info.msg_len);
+   info.msg_len);
} else {
+   /* check for inline IPsec events */
+   struct inline_ipsec_msg *imsg =
+   (struct inline_ipsec_msg *)info.msg_buf;
+   struct rte_eth_event_ipsec_desc desc;
+   if (msg_opc == VIRTCHNL_OP_INLINE_IPSEC_CRYPTO
+   && imsg->ipsec_opcode ==
+   INLINE_IPSEC_OP_EVENT) {
+   struct virtchnl_ipsec_event *ev =
+   imsg->ipsec_data.event;
+   desc.subtype =
+

[dpdk-dev] [PATCH v3 4/6] net/iavf: add iAVF IPsec inline crypto support

2021-09-20 Thread Radu Nicolau

Add support for inline crypto for IPsec, for ESP transport and
tunnel over IPv4 and IPv6, as well as supporting the offload for
ESP over UDP, and inconjunction with TSO for UDP and TCP flows.
Implement support for rte_security packet metadata

Add definition for IPsec descriptors, extend support for offload
in data and context descriptor to support

Add support to virtual channel mailbox for IPsec Crypto request
operations. IPsec Crypto requests receive an initial acknowledgement
from phsyical function driver of receipt of request and then an
asynchronous response with success/failure of request including any
response data.

Add enhanced descriptor debugging

Refactor of scalar tx burst function to support integration of offload

Signed-off-by: Declan Doherty 
Signed-off-by: Abhijit Sinha 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf.h   |   10 +
 drivers/net/iavf/iavf_ethdev.c|   41 +-
 drivers/net/iavf/iavf_generic_flow.c  |   16 +
 drivers/net/iavf/iavf_generic_flow.h  |2 +
 drivers/net/iavf/iavf_ipsec_crypto.c  | 1918 +
 drivers/net/iavf/iavf_ipsec_crypto.h  |   96 +
 .../net/iavf/iavf_ipsec_crypto_capabilities.h |  383 
 drivers/net/iavf/iavf_rxtx.c  |  203 +-
 drivers/net/iavf/iavf_rxtx.h  |   94 +-
 drivers/net/iavf/iavf_vchnl.c |   29 +
 drivers/net/iavf/meson.build  |3 +-
 drivers/net/iavf/rte_pmd_iavf.h   |1 +
 drivers/net/iavf/version.map  |3 +
 13 files changed, 2777 insertions(+), 22 deletions(-)
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.c
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto.h
 create mode 100644 drivers/net/iavf/iavf_ipsec_crypto_capabilities.h

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 8c7f7c0bed..934ef48278 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -217,6 +217,7 @@ struct iavf_info {
rte_spinlock_t flow_ops_lock;
struct iavf_parser_list rss_parser_list;
struct iavf_parser_list dist_parser_list;
+   struct iavf_parser_list ipsec_crypto_parser_list;
 
struct iavf_fdir_info fdir; /* flow director info */
/* indicate large VF support enabled or not */
@@ -239,6 +240,7 @@ enum iavf_proto_xtr_type {
IAVF_PROTO_XTR_IPV6_FLOW,
IAVF_PROTO_XTR_TCP,
IAVF_PROTO_XTR_IP_OFFSET,
+   IAVF_PROTO_XTR_IPSEC_CRYPTO_SAID,
IAVF_PROTO_XTR_MAX,
 };
 
@@ -250,11 +252,14 @@ struct iavf_devargs {
uint8_t proto_xtr[IAVF_MAX_QUEUE_NUM];
 };
 
+struct iavf_security_ctx;
+
 /* Structure to store private data for each VF instance. */
 struct iavf_adapter {
struct iavf_hw hw;
struct rte_eth_dev *eth_dev;
struct iavf_info vf;
+   struct iavf_security_ctx *security_ctx;
 
bool rx_bulk_alloc_allowed;
/* For vector PMD */
@@ -273,6 +278,8 @@ struct iavf_adapter {
(&((struct iavf_adapter *)adapter)->vf)
 #define IAVF_DEV_PRIVATE_TO_HW(adapter) \
(&((struct iavf_adapter *)adapter)->hw)
+#define IAVF_DEV_PRIVATE_TO_IAVF_SECURITY_CTX(adapter) \
+   (((struct iavf_adapter *)adapter)->security_ctx)
 
 /* IAVF_VSI_TO */
 #define IAVF_VSI_TO_HW(vsi) \
@@ -415,5 +422,8 @@ int iavf_set_q_tc_map(struct rte_eth_dev *dev,
uint16_t size);
 void iavf_tm_conf_init(struct rte_eth_dev *dev);
 void iavf_tm_conf_uninit(struct rte_eth_dev *dev);
+int iavf_ipsec_crypto_request(struct iavf_adapter *adapter,
+   uint8_t *msg, size_t msg_len,
+   uint8_t *resp_msg, size_t resp_msg_len);
 extern const struct rte_tm_ops iavf_tm_ops;
 #endif /* _IAVF_ETHDEV_H_ */
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index c131461517..294be1a022 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -29,6 +29,7 @@
 #include "iavf_rxtx.h"
 #include "iavf_generic_flow.h"
 #include "rte_pmd_iavf.h"
+#include "iavf_ipsec_crypto.h"
 
 /* devargs */
 #define IAVF_PROTO_XTR_ARG "proto_xtr"
@@ -70,6 +71,11 @@ static struct iavf_proto_xtr_ol iavf_proto_xtr_params[] = {
[IAVF_PROTO_XTR_IP_OFFSET] = {
.param = { .name = "intel_pmd_dynflag_proto_xtr_ip_offset" },
.ol_flag = &rte_pmd_ifd_dynflag_proto_xtr_ip_offset_mask },
+   [IAVF_PROTO_XTR_IPSEC_CRYPTO_SAID] = {
+   .param = {
+   .name = "intel_pmd_dynflag_proto_xtr_ipsec_crypto_said" },
+   .ol_flag =
+   &rte_pmd_ifd_dynflag_proto_xtr_ipsec_crypto_said_mask },
 };
 
 static int iavf_dev_configure(struct rte_eth_dev *dev);
@@ -922,6 +928,9 @@ iavf_dev_stop(struct rte_eth_dev *dev)
iavf_add_del_mc_addr_list(adapter, vf->mc_addrs, vf->mc_addrs_num,
  false);
 
+   /* free iAVF security device context all related resources */
+

[dpdk-dev] [PATCH v3 5/6] net/iavf: add xstats support for inline IPsec crypto

2021-09-20 Thread Radu Nicolau

Add per queue counters for maintaining statistics for inline IPsec
crypto offload, which can be retrieved through the
rte_security_session_stats_get() with more detailed errors through the
rte_ethdev xstats.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf.h| 21 -
 drivers/net/iavf/iavf_ethdev.c | 84 --
 drivers/net/iavf/iavf_rxtx.h   | 12 -
 3 files changed, 89 insertions(+), 28 deletions(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index 934ef48278..d5f574b4b3 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -92,6 +92,25 @@ struct iavf_adapter;
 struct iavf_rx_queue;
 struct iavf_tx_queue;
 
+
+struct iavf_ipsec_crypto_stats {
+   uint64_t icount;
+   uint64_t ibytes;
+   struct {
+   uint64_t count;
+   uint64_t sad_miss;
+   uint64_t not_processed;
+   uint64_t icv_check;
+   uint64_t ipsec_length;
+   uint64_t misc;
+   } ierrors;
+};
+
+struct iavf_eth_xstats {
+   struct virtchnl_eth_stats eth_stats;
+   struct iavf_ipsec_crypto_stats ips_stats;
+};
+
 /* Structure that defines a VSI, associated with a adapter. */
 struct iavf_vsi {
struct iavf_adapter *adapter; /* Backreference to associated adapter */
@@ -101,7 +120,7 @@ struct iavf_vsi {
uint16_t max_macaddrs;   /* Maximum number of MAC addresses */
uint16_t base_vector;
uint16_t msix_intr;  /* The MSIX interrupt binds to VSI */
-   struct virtchnl_eth_stats eth_stats_offset;
+   struct iavf_eth_xstats eth_stats_offset;
 };
 
 struct rte_flow;
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 294be1a022..aad6a28585 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -89,6 +89,7 @@ static const uint32_t *iavf_dev_supported_ptypes_get(struct 
rte_eth_dev *dev);
 static int iavf_dev_stats_get(struct rte_eth_dev *dev,
 struct rte_eth_stats *stats);
 static int iavf_dev_stats_reset(struct rte_eth_dev *dev);
+static int iavf_dev_xstats_reset(struct rte_eth_dev *dev);
 static int iavf_dev_xstats_get(struct rte_eth_dev *dev,
 struct rte_eth_xstat *xstats, unsigned int n);
 static int iavf_dev_xstats_get_names(struct rte_eth_dev *dev,
@@ -144,21 +145,37 @@ struct rte_iavf_xstats_name_off {
unsigned int offset;
 };
 
+#define _OFF_OF(a) offsetof(struct iavf_eth_xstats, a)
 static const struct rte_iavf_xstats_name_off rte_iavf_stats_strings[] = {
-   {"rx_bytes", offsetof(struct iavf_eth_stats, rx_bytes)},
-   {"rx_unicast_packets", offsetof(struct iavf_eth_stats, rx_unicast)},
-   {"rx_multicast_packets", offsetof(struct iavf_eth_stats, rx_multicast)},
-   {"rx_broadcast_packets", offsetof(struct iavf_eth_stats, rx_broadcast)},
-   {"rx_dropped_packets", offsetof(struct iavf_eth_stats, rx_discards)},
+   {"rx_bytes", _OFF_OF(eth_stats.rx_bytes)},
+   {"rx_unicast_packets", _OFF_OF(eth_stats.rx_unicast)},
+   {"rx_multicast_packets", _OFF_OF(eth_stats.rx_multicast)},
+   {"rx_broadcast_packets", _OFF_OF(eth_stats.rx_broadcast)},
+   {"rx_dropped_packets", _OFF_OF(eth_stats.rx_discards)},
{"rx_unknown_protocol_packets", offsetof(struct iavf_eth_stats,
rx_unknown_protocol)},
-   {"tx_bytes", offsetof(struct iavf_eth_stats, tx_bytes)},
-   {"tx_unicast_packets", offsetof(struct iavf_eth_stats, tx_unicast)},
-   {"tx_multicast_packets", offsetof(struct iavf_eth_stats, tx_multicast)},
-   {"tx_broadcast_packets", offsetof(struct iavf_eth_stats, tx_broadcast)},
-   {"tx_dropped_packets", offsetof(struct iavf_eth_stats, tx_discards)},
-   {"tx_error_packets", offsetof(struct iavf_eth_stats, tx_errors)},
+   {"tx_bytes", _OFF_OF(eth_stats.tx_bytes)},
+   {"tx_unicast_packets", _OFF_OF(eth_stats.tx_unicast)},
+   {"tx_multicast_packets", _OFF_OF(eth_stats.tx_multicast)},
+   {"tx_broadcast_packets", _OFF_OF(eth_stats.tx_broadcast)},
+   {"tx_dropped_packets", _OFF_OF(eth_stats.tx_discards)},
+   {"tx_error_packets", _OFF_OF(eth_stats.tx_errors)},
+
+   {"inline_ipsec_crypto_ipackets", _OFF_OF(ips_stats.icount)},
+   {"inline_ipsec_crypto_ibytes", _OFF_OF(ips_stats.ibytes)},
+   {"inline_ipsec_crypto_ierrors", _OFF_OF(ips_stats.ierrors.count)},
+   {"inline_ipsec_crypto_ierrors_sad_lookup",
+   _OFF_OF(ips_stats.ierrors.sad_miss)},
+   {"inline_ipsec_crypto_ierrors_not_processed",
+   _OFF_OF(ips_stats.ierrors.not_processed)},
+   {"inline_ipsec_crypto_ierrors_icv_fail",
+   _OFF_OF(ips_stats.ierrors.icv_check)},
+   {"inline_ipsec_crypto_ierrors_length",
+   _OFF_OF(ips_stats.ierrors.ipsec_length)},
+   {"inline_ipsec_crypto_ierrors_misc",
+

[dpdk-dev] [PATCH v3 6/6] net/iavf: add watchdog for VFLR

2021-09-20 Thread Radu Nicolau

Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notfiy user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
 drivers/net/iavf/iavf.h|  6 +++
 drivers/net/iavf/iavf_ethdev.c | 97 ++
 2 files changed, 103 insertions(+)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index d5f574b4b3..4481d2e134 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -212,6 +212,12 @@ struct iavf_info {
int cmd_retval; /* return value of the cmd response from PF */
uint8_t *aq_resp; /* buffer to store the adminq response from PF */
 
+   struct {
+   uint8_t enabled:1;
+   uint64_t period_us;
+   } watchdog;
+   /** iAVF watchdog configuration */
+
/* Event from pf */
bool dev_closed;
bool link_up;
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index aad6a28585..d02aa9c1c5 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "iavf.h"
 #include "iavf_rxtx.h"
@@ -239,6 +240,94 @@ iavf_tm_ops_get(struct rte_eth_dev *dev __rte_unused,
return 0;
 }
 
+
+static int
+iavf_vfr_inprogress(struct iavf_hw *hw)
+{
+   int inprogress = 0;
+
+   if ((IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT) &
+   IAVF_VFGEN_RSTAT_VFR_STATE_MASK) ==
+   VIRTCHNL_VFR_INPROGRESS)
+   inprogress = 1;
+
+   if (inprogress)
+   PMD_DRV_LOG(INFO, "Watchdog detected VFR in progress");
+
+   return inprogress;
+}
+
+static void
+iavf_dev_watchdog(void *cb_arg)
+{
+   struct iavf_adapter *adapter = cb_arg;
+   struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(adapter);
+   int vfr_inprogress = 0, rc = 0;
+
+   /* check if watchdog has been disabled since last call */
+   if (!adapter->vf.watchdog.enabled)
+   return;
+
+   /* If in reset then poll vfr_inprogress register for completion */
+   if (adapter->vf.vf_reset) {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (!vfr_inprogress) {
+   PMD_DRV_LOG(INFO, "VF \"%s\" reset has completed",
+   adapter->eth_dev->data->name);
+   adapter->vf.vf_reset = false;
+   }
+   /* If not in reset then poll vfr_inprogress register for VFLR event */
+   } else {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (vfr_inprogress) {
+   PMD_DRV_LOG(INFO,
+   "VF \"%s\" reset event has been detected by 
watchdog",
+   adapter->eth_dev->data->name);
+
+   /* enter reset state with VFLR event */
+   adapter->vf.vf_reset = true;
+
+   rte_eth_dev_callback_process(adapter->eth_dev,
+   RTE_ETH_EVENT_INTR_RESET, NULL);
+   }
+   }
+
+   /* re-alarm watchdog */
+   rc = rte_eal_alarm_set(adapter->vf.watchdog.period_us,
+   &iavf_dev_watchdog, cb_arg);
+
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed \"%s\" to reset device watchdog alarm",
+   adapter->eth_dev->data->name);
+}
+
+static void
+iavf_dev_watchdog_enable(struct iavf_adapter *adapter, uint64_t period_us)
+{
+   int rc;
+
+   PMD_DRV_LOG(INFO, "Enabling device watchdog");
+
+   adapter->vf.watchdog.enabled = 1;
+   adapter->vf.watchdog.period_us = period_us;
+
+   rc = rte_eal_alarm_set(adapter->vf.watchdog.period_us,
+   &iavf_dev_watchdog, (void *)adapter);
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed to enabled device watchdog");
+}
+
+static void
+iavf_dev_watchdog_disable(struct iavf_adapter *adapter)
+{
+   PMD_DRV_LOG(INFO, "Disabling device watchdog");
+
+   adapter->vf.watchdog.enabled = 0;
+   adapter->vf.watchdog.period_us = 0;
+}
+
 static int
 iavf_set_mc_addr_list(struct rte_eth_dev *dev,
struct rte_ether_addr *mc_addrs,
@@ -2448,6 +2537,11 @@ iavf_dev_init(struct rte_eth_dev *eth_dev)
 
iavf_default_rss_disable(adapter);
 
+
+   /* Start device watchdog, set polling period to 500us */
+   iavf_dev_watchdog_enable(adapter, 500);
+
+
return 0;
 
 flow_init_err:
@@ -2527,6 +2621,9 @@ iavf_dev_close(struct rte_eth_dev *dev)
if (vf->vf_reset && !rte_pci_set_bus_master(pci_dev, true))
vf->vf_reset = false;
 
+   /* disable watchdog */
+   iavf_dev_watchdog_disable(adapter);
+
return ret;
 }
 
-- 
2.25.1

Re: [dpdk-dev] [RFC V2] ethdev: fix issue that dev close in PMD calls twice

2021-09-20 Thread Ferruh Yigit

On 8/25/2021 10:53 AM, Huisong Li wrote:
> 
> 在 2021/8/24 22:42, Ferruh Yigit 写道:
>> On 8/19/2021 4:45 AM, Huisong Li wrote:
>>> 在 2021/8/18 19:24, Ferruh Yigit 写道:
 On 8/13/2021 9:16 AM, Huisong Li wrote:
> 在 2021/8/13 14:12, Thomas Monjalon 写道:
>> 13/08/2021 04:11, Huisong Li:
>>> Hi, all
>>>
>>> This patch can enhance the security of device uninstallation to
>>> eliminate dependency on user usage methods.
>>>
>>> Can you check this patch?
>>>
>>>
>>> 在 2021/8/3 10:30, Huisong Li 写道:
 Ethernet devices in DPDK can be released by rte_eth_dev_close() and
 rte_dev_remove(). These APIs both call xxx_dev_close() in PMD layer
 to uninstall hardware. However, the two APIs do not have explicit
 invocation restrictions. In other words, at the ethdev layer, it is
 possible to call rte_eth_dev_close() before calling rte_dev_remove()
 or rte_eal_hotplug_remove(). In such a bad scenario,
>> It is not a bad scenario.
>> If there is no more port for the device after calling close,
>> the device should be removed automatically.
>> Keep in mind "close" is for one port, "remove" is for the entire device
>> which can have more than one port.
> I know.
>
> dev_close() is for removing an eth device. And rte_dev_remove() can be 
> used
>
> for removing the rte device and all its eth devices belonging to the rte
> device.
>
> In rte_dev_remove(), "remove" is executed in primary or one of secondary,
>
> all eth devices having same pci address will be closed and removed.
>
 the primary
 process may be fine, but it may cause that xxx_dev_close() in the PMD
 layer will be called twice in the secondary process. So this patch
 fixes it.
>> If a port is closed in primary, it should be the same in secondary.
>>
>>
 +    /*
 + * The eth_dev->data->name doesn't be cleared by the secondary
 process,
 + * so above "eth_dev" isn't NULL after rte_eth_dev_close() called.
>> This assumption is not clear. All should be closed together.
> However, dev_close() does not have the feature similar to 
> rte_dev_remove().
>
> Namely, it is not guaranteed that all eth devices are closed together in
> ethdev
> layer. It depends on app or user.
>
> If the app does not close together, the operation of repeatedly
> uninstalling an
> eth device in the secondary process
>
> will be triggered when dev_close() is first called by one secondary
> process, and
> then rte_dev_remove() is called.
>
> So I think it should be avoided.
 First of all, I am not sure about calling 'rte_eth_dev_close()' or
 'rte_dev_remove()' from the secondary process.
 There are explicit checks in various locations to prevent clearing 
 resources
 completely from secondary process.
>>> There's no denying that.
>>>
>>> Generally, hardware resources of eth device and shared data of the primary 
>>> and
>>> secondary process
>>>
>>> are cleared by primary, which are controled by ethdev layer or PMD layer.
>>>
>>> But there may be some private data or resources of each process (primary or
>>> secondary ), such as mp action
>>>
>>> registered by rte_mp_action_register() or others.  For these resources, the
>>> secondary process still needs to clear.
>>>
>>> Namely, both primary and secondary processes need to prevent repeated 
>>> offloading
>>> of resources.
>>>
 Calling 'rte_eth_dev_close()' or 'rte_dev_remove()' by secondary is 
 technically
 can be done but application needs to be extra cautious and should take 
 extra
 measures and synchronization to make it work.
 Regular use-case is secondary processes do the packet processing and all
 control
 commands run by primary.
>>> You are right. We have a consensus that 'rte_eth_dev_close()' or
>>> 'rte_dev_remove()'
>>>
>>> can be called by primary and secondary processes.
>>>
>>> But DPDK framework cannot assume user behavior.😁
>>>
>>> We need to make it more secure and reliable for both primary and secondary
>>> processes.
>>>
 In primary, if you call 'rte_eth_dev_close()' it will clear all ethdev
 resources
 and further 'rte_dev_remove()' call will detect missing ethdev resources 
 and
 won't try to clear them again.

 In secondary, if you call 'rte_eth_dev_close()', it WON'T clear all 
 resources
 and further 'rte_dev_remove()' call (either from primary or secondary) 
 will try
 to clean ethdev resources again. You are trying to prevent this retry in 
 remove
 happening for secondary process.
>>> Right. However, if secondary process in PMD layer has its own private 
>>> resources
>>> to be
>>>
>>> cleared, it still need to do it by calling 'rte_eth_dev_close()' or
>>> 'rte_dev_remove()'.
>>>
 In secondary it won't free et

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-20 Thread Stephen Hemminger

On Mon, 20 Sep 2021 13:23:57 +
"Loftus, Ciara"  wrote:

> > -Original Message-
> > From: dev  On Behalf Of Stephen Hemminger
> > Sent: Friday 3 September 2021 17:15
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger ;
> > sta...@dpdk.org; xiaolong...@intel.com
> > Subject: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process
> > 
> > Doing basic operations like info_get or get_stats was broken
> > in af_xdp PMD. The info_get would crash because dev->device
> > was NULL in secondary process. Fix this by doing same initialization
> > as af_packet and tap devices.
> > 
> > The get_stats would crash because the XDP socket is not open in
> > primary process. As a workaround don't query kernel for dropped
> > packets when called from secondary process.
> > 
> > Note: this does not address the other bug which is that transmitting
> > in secondary process is broken because the send() in tx_kick
> > will fail because XDP socket fd is not valid in secondary process.  
> 
> Hi Stephen,
> 
> Apologies for the delayed reply, I was on vacation.
> 
> In the Bugzilla report you suggest we:
> "mark AF_XDP as broken in with primary/secondary
> and return an error in probe in secondary process".
> I agree with this suggestion. However with this patch we still permit 
> secondary, and just make sure it doesn't crash for get_stats. Did you change 
> your mind?
> Personally, I would prefer to have primary/secondary either working 100% or 
> else not allowed at all by throwing an error during probe. What do you think? 
> Do you have a reason/use case to permit secondary processes despite some 
> features not being available eg. full stats, tx?
> 
> Thanks,
> Ciara

There are two cases where secondary is useful even if send/receive can't work 
from secondary process.
The pdump and proc-info applications can work with these patches.

I am using XDP over pdump as an easy way to get packets into the code for 
testing.

The flag in the documentation doesn't have a "limited" version.
If you want, will send another patch to disable secondary support.

Supporting secondary, means adding a mechanism to pass the socket around.

[dpdk-dev] [PATCH] net/octeontx: fix invalid access to indirect buffers

2021-09-20 Thread Harman Kalra

Issue has been observed where fields of indirect buffers are
accessed after being set free by the diver. Also fixing freeing
of direct buffers to correct aura.

Fixes: 5cbe184802aa ("net/octeontx: support fast mbuf free")
Cc: sta...@dpdk.org

Signed-off-by: David George 
Signed-off-by: Harman Kalra 
---
 drivers/net/octeontx/octeontx_rxtx.h | 69 ++--
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/net/octeontx/octeontx_rxtx.h 
b/drivers/net/octeontx/octeontx_rxtx.h
index 2ed28ea563..e0723ac26a 100644
--- a/drivers/net/octeontx/octeontx_rxtx.h
+++ b/drivers/net/octeontx/octeontx_rxtx.h
@@ -161,7 +161,7 @@ ptype_table[PTYPE_SIZE][PTYPE_SIZE][PTYPE_SIZE] = {
 
 
 static __rte_always_inline uint64_t
-octeontx_pktmbuf_detach(struct rte_mbuf *m)
+octeontx_pktmbuf_detach(struct rte_mbuf *m, struct rte_mbuf **m_tofree)
 {
struct rte_mempool *mp = m->pool;
uint32_t mbuf_size, buf_len;
@@ -171,6 +171,8 @@ octeontx_pktmbuf_detach(struct rte_mbuf *m)
 
/* Update refcount of direct mbuf */
md = rte_mbuf_from_indirect(m);
+   /* The real data will be in the direct buffer, inform callers this */
+   *m_tofree = md;
refcount = rte_mbuf_refcnt_update(md, -1);
 
priv_size = rte_pktmbuf_priv_size(mp);
@@ -203,18 +205,18 @@ octeontx_pktmbuf_detach(struct rte_mbuf *m)
 }
 
 static __rte_always_inline uint64_t
-octeontx_prefree_seg(struct rte_mbuf *m)
+octeontx_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **m_tofree)
 {
if (likely(rte_mbuf_refcnt_read(m) == 1)) {
if (!RTE_MBUF_DIRECT(m))
-   return octeontx_pktmbuf_detach(m);
+   return octeontx_pktmbuf_detach(m, m_tofree);
 
m->next = NULL;
m->nb_segs = 1;
return 0;
} else if (rte_mbuf_refcnt_update(m, -1) == 0) {
if (!RTE_MBUF_DIRECT(m))
-   return octeontx_pktmbuf_detach(m);
+   return octeontx_pktmbuf_detach(m, m_tofree);
 
rte_mbuf_refcnt_set(m, 1);
m->next = NULL;
@@ -315,6 +317,14 @@ __octeontx_xmit_prepare(struct rte_mbuf *tx_pkt, uint64_t 
*cmd_buf,
const uint16_t flag)
 {
uint16_t gaura_id, nb_desc = 0;
+   struct rte_mbuf *m_tofree;
+   rte_iova_t iova;
+   uint16_t data_len;
+
+   m_tofree = tx_pkt;
+
+   data_len = tx_pkt->data_len;
+   iova = rte_mbuf_data_iova(tx_pkt);
 
/* Setup PKO_SEND_HDR_S */
cmd_buf[nb_desc++] = tx_pkt->data_len & 0x;
@@ -329,22 +339,23 @@ __octeontx_xmit_prepare(struct rte_mbuf *tx_pkt, uint64_t 
*cmd_buf,
 * not, as SG_DESC[I] and SEND_HDR[II] are clear.
 */
if (flag & OCCTX_TX_OFFLOAD_MBUF_NOFF_F)
-   cmd_buf[0] |= (octeontx_prefree_seg(tx_pkt) <<
+   cmd_buf[0] |= (octeontx_prefree_seg(tx_pkt, &m_tofree) <<
   58);
 
/* Mark mempool object as "put" since it is freed by PKO */
if (!(cmd_buf[0] & (1ULL << 58)))
-   __mempool_check_cookies(tx_pkt->pool, (void **)&tx_pkt,
+   __mempool_check_cookies(m_tofree->pool, (void **)&m_tofree,
1, 0);
/* Get the gaura Id */
-   gaura_id = octeontx_fpa_bufpool_gaura((uintptr_t)tx_pkt->pool->pool_id);
+   gaura_id =
+   octeontx_fpa_bufpool_gaura((uintptr_t)m_tofree->pool->pool_id);
 
/* Setup PKO_SEND_BUFLINK_S */
cmd_buf[nb_desc++] = PKO_SEND_BUFLINK_SUBDC |
PKO_SEND_BUFLINK_LDTYPE(0x1ull) |
PKO_SEND_BUFLINK_GAUAR((long)gaura_id) |
-   tx_pkt->data_len;
-   cmd_buf[nb_desc++] = rte_mbuf_data_iova(tx_pkt);
+   data_len;
+   cmd_buf[nb_desc++] = iova;
 
return nb_desc;
 }
@@ -355,7 +366,9 @@ __octeontx_xmit_mseg_prepare(struct rte_mbuf *tx_pkt, 
uint64_t *cmd_buf,
 {
uint16_t nb_segs, nb_desc = 0;
uint16_t gaura_id, len = 0;
-   struct rte_mbuf *m_next = NULL;
+   struct rte_mbuf *m_next = NULL, *m_tofree;
+   rte_iova_t iova;
+   uint16_t data_len;
 
nb_segs = tx_pkt->nb_segs;
/* Setup PKO_SEND_HDR_S */
@@ -369,40 +382,50 @@ __octeontx_xmit_mseg_prepare(struct rte_mbuf *tx_pkt, 
uint64_t *cmd_buf,
 
do {
m_next = tx_pkt->next;
-   /* To handle case where mbufs belong to diff pools, like
-* fragmentation
+   /* Get TX parameters up front, octeontx_prefree_seg might change
+* them
 */
-   gaura_id = octeontx_fpa_bufpool_gaura((uintptr_t)
- tx_pkt->pool->pool_id);
+   m_tofree = tx_pkt;
+   data_len = tx_pkt->data_len;
+   iova = rte_mbuf_data_iova(tx_pkt);
 
/* Setup PKO_SEND_GATHER_S */
-

[dpdk-dev] [PATCH V5 1/4] table: add support learner tables

2021-09-20 Thread Cristian Dumitrescu

A learner table is typically used for learning or connection tracking,
where it allows for the implementation of the "add on miss" scenario:
whenever the lookup key is not found in the table (lookup miss), the
data plane can decide to add this key to the table with a given action
with no control plane intervention. Likewise, the table keys expire
based on a configurable timeout and are automatically deleted from the
table with no control plane intervention.

Signed-off-by: Cristian Dumitrescu 
---
Depends-on: series-18023 ("[V2,1/5] pipeline: prepare for variable size 
headers")

V2: fixed one "line too long" coding style warning.

 lib/table/meson.build |   2 +
 lib/table/rte_swx_table_learner.c | 617 ++
 lib/table/rte_swx_table_learner.h | 206 ++
 lib/table/version.map |   9 +
 4 files changed, 834 insertions(+)
 create mode 100644 lib/table/rte_swx_table_learner.c
 create mode 100644 lib/table/rte_swx_table_learner.h

diff --git a/lib/table/meson.build b/lib/table/meson.build
index a1384456a9..ac1f1aac27 100644
--- a/lib/table/meson.build
+++ b/lib/table/meson.build
@@ -3,6 +3,7 @@
 
 sources = files(
 'rte_swx_table_em.c',
+'rte_swx_table_learner.c',
 'rte_swx_table_selector.c',
 'rte_swx_table_wm.c',
 'rte_table_acl.c',
@@ -21,6 +22,7 @@ headers = files(
 'rte_lru.h',
 'rte_swx_table.h',
 'rte_swx_table_em.h',
+'rte_swx_table_learner.h',
 'rte_swx_table_selector.h',
 'rte_swx_table_wm.h',
 'rte_table.h',
diff --git a/lib/table/rte_swx_table_learner.c 
b/lib/table/rte_swx_table_learner.c
new file mode 100644
index 00..c3c840ff06
--- /dev/null
+++ b/lib/table/rte_swx_table_learner.c
@@ -0,0 +1,617 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "rte_swx_table_learner.h"
+
+#ifndef RTE_SWX_TABLE_LEARNER_USE_HUGE_PAGES
+#define RTE_SWX_TABLE_LEARNER_USE_HUGE_PAGES 1
+#endif
+
+#ifndef RTE_SWX_TABLE_SELECTOR_HUGE_PAGES_DISABLE
+
+#include 
+
+static void *
+env_calloc(size_t size, size_t alignment, int numa_node)
+{
+   return rte_zmalloc_socket(NULL, size, alignment, numa_node);
+}
+
+static void
+env_free(void *start, size_t size __rte_unused)
+{
+   rte_free(start);
+}
+
+#else
+
+#include 
+
+static void *
+env_calloc(size_t size, size_t alignment __rte_unused, int numa_node)
+{
+   void *start;
+
+   if (numa_available() == -1)
+   return NULL;
+
+   start = numa_alloc_onnode(size, numa_node);
+   if (!start)
+   return NULL;
+
+   memset(start, 0, size);
+   return start;
+}
+
+static void
+env_free(void *start, size_t size)
+{
+   if ((numa_available() == -1) || !start)
+   return;
+
+   numa_free(start, size);
+}
+
+#endif
+
+#if defined(RTE_ARCH_X86_64)
+
+#include 
+
+#define crc32_u64(crc, v) _mm_crc32_u64(crc, v)
+
+#else
+
+static inline uint64_t
+crc32_u64_generic(uint64_t crc, uint64_t value)
+{
+   int i;
+
+   crc = (crc & 0xLLU) ^ value;
+   for (i = 63; i >= 0; i--) {
+   uint64_t mask;
+
+   mask = -(crc & 1LLU);
+   crc = (crc >> 1LLU) ^ (0x82F63B78LLU & mask);
+   }
+
+   return crc;
+}
+
+#define crc32_u64(crc, v) crc32_u64_generic(crc, v)
+
+#endif
+
+/* Key size needs to be one of: 8, 16, 32 or 64. */
+static inline uint32_t
+hash(void *key, void *key_mask, uint32_t key_size, uint32_t seed)
+{
+   uint64_t *k = key;
+   uint64_t *m = key_mask;
+   uint64_t k0, k2, k5, crc0, crc1, crc2, crc3, crc4, crc5;
+
+   switch (key_size) {
+   case 8:
+   crc0 = crc32_u64(seed, k[0] & m[0]);
+   return crc0;
+
+   case 16:
+   k0 = k[0] & m[0];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc0 ^= crc1;
+
+   return crc0;
+
+   case 32:
+   k0 = k[0] & m[0];
+   k2 = k[2] & m[2];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc2 = crc32_u64(k2, k[3] & m[3]);
+   crc3 = k2 >> 32;
+
+   crc0 = crc32_u64(crc0, crc1);
+   crc1 = crc32_u64(crc2, crc3);
+
+   crc0 ^= crc1;
+
+   return crc0;
+
+   case 64:
+   k0 = k[0] & m[0];
+   k2 = k[2] & m[2];
+   k5 = k[5] & m[5];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc2 = crc32_u64(k2, k[3] & m[3]);
+   crc3 = crc32_u64(k2 >> 32, k[4] & m[4]);
+
+   crc4 = crc32_u64(k5, k[6] & m[6]);
+   crc5 = crc32_u64(k5 >> 32, k[7] & m[7]);
+
+   crc0 = crc32_u64(crc

[dpdk-dev] [PATCH V5 3/4] examples/pipeline: add support for learner tables

2021-09-20 Thread Cristian Dumitrescu

Add application-level support for learner tables.

Signed-off-by: Cristian Dumitrescu 
---
 examples/pipeline/cli.c | 174 
 1 file changed, 174 insertions(+)

diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c
index 1e2dd9d704..39b1e7a41b 100644
--- a/examples/pipeline/cli.c
+++ b/examples/pipeline/cli.c
@@ -1829,6 +1829,104 @@ cmd_pipeline_selector_show(char **tokens,
snprintf(out, out_size, MSG_ARG_INVALID, "selector_name");
 }
 
+static int
+pipeline_learner_default_entry_add(struct rte_swx_ctl_pipeline *p,
+  const char *learner_name,
+  FILE *file,
+  uint32_t *file_line_number)
+{
+   char *line = NULL;
+   uint32_t line_id = 0;
+   int status = 0;
+
+   /* Buffer allocation. */
+   line = malloc(MAX_LINE_SIZE);
+   if (!line)
+   return -ENOMEM;
+
+   /* File read. */
+   for (line_id = 1; ; line_id++) {
+   struct rte_swx_table_entry *entry;
+   int is_blank_or_comment;
+
+   if (fgets(line, MAX_LINE_SIZE, file) == NULL)
+   break;
+
+   entry = rte_swx_ctl_pipeline_learner_default_entry_read(p,
+   
learner_name,
+   line,
+   
&is_blank_or_comment);
+   if (!entry) {
+   if (is_blank_or_comment)
+   continue;
+
+   status = -EINVAL;
+   goto error;
+   }
+
+   status = rte_swx_ctl_pipeline_learner_default_entry_add(p,
+   
learner_name,
+   entry);
+   table_entry_free(entry);
+   if (status)
+   goto error;
+   }
+
+error:
+   *file_line_number = line_id;
+   free(line);
+   return status;
+}
+
+static const char cmd_pipeline_learner_default_help[] =
+"pipeline  learner  default \n";
+
+static void
+cmd_pipeline_learner_default(char **tokens,
+uint32_t n_tokens,
+char *out,
+size_t out_size,
+void *obj)
+{
+   struct pipeline *p;
+   char *pipeline_name, *learner_name, *file_name;
+   FILE *file = NULL;
+   uint32_t file_line_number = 0;
+   int status;
+
+   if (n_tokens != 6) {
+   snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
+   return;
+   }
+
+   pipeline_name = tokens[1];
+   p = pipeline_find(obj, pipeline_name);
+   if (!p || !p->ctl) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "pipeline_name");
+   return;
+   }
+
+   learner_name = tokens[3];
+
+   file_name = tokens[5];
+   file = fopen(file_name, "r");
+   if (!file) {
+   snprintf(out, out_size, "Cannot open file %s.\n", file_name);
+   return;
+   }
+
+   status = pipeline_learner_default_entry_add(p->ctl,
+   learner_name,
+   file,
+   &file_line_number);
+   if (status)
+   snprintf(out, out_size, "Invalid entry in file %s at line %u\n",
+file_name,
+file_line_number);
+
+   fclose(file);
+}
+
 static const char cmd_pipeline_commit_help[] =
 "pipeline  commit\n";
 
@@ -2503,6 +2601,64 @@ cmd_pipeline_stats(char **tokens,
out += strlen(out);
}
}
+
+   snprintf(out, out_size, "\nLearner tables:\n");
+   out_size -= strlen(out);
+   out += strlen(out);
+
+   for (i = 0; i < info.n_learners; i++) {
+   struct rte_swx_ctl_learner_info learner_info;
+   uint64_t n_pkts_action[info.n_actions];
+   struct rte_swx_learner_stats stats = {
+   .n_pkts_hit = 0,
+   .n_pkts_miss = 0,
+   .n_pkts_action = n_pkts_action,
+   };
+   uint32_t j;
+
+   status = rte_swx_ctl_learner_info_get(p->p, i, &learner_info);
+   if (status) {
+   snprintf(out, out_size, "Learner table info get 
error.");
+   return;
+   }
+
+   status = rte_swx_ctl_pipeline_learner_stats_read(p->p, 
learner_info.name, &stats);
+   if (status) {
+   snprintf(out, out_size, "Learner table stats read 
error.");
+

[dpdk-dev] [PATCH V5 2/4] pipeline: add support for learner tables

2021-09-20 Thread Cristian Dumitrescu

Add pipeline level support for learner tables.

Signed-off-by: Cristian Dumitrescu 
---

V2: Added more configuration consistency checks.
V3: Fixed one coding style indentation error.
V4: Fixed a pointer dereferencing issue in function 
rte_swx_ctl_pipeline_learner_stats_read().
V5: Added function rte_swx_ctl_pipeline_learner_default_entry_read() to the 
version.map file.

 lib/pipeline/rte_swx_ctl.c   |  479 +++-
 lib/pipeline/rte_swx_ctl.h   |  186 +
 lib/pipeline/rte_swx_pipeline.c  | 1041 --
 lib/pipeline/rte_swx_pipeline.h  |   77 ++
 lib/pipeline/rte_swx_pipeline_spec.c |  470 +++-
 lib/pipeline/version.map |9 +
 6 files changed, 2207 insertions(+), 55 deletions(-)

diff --git a/lib/pipeline/rte_swx_ctl.c b/lib/pipeline/rte_swx_ctl.c
index dc093860de..86b58e21dc 100644
--- a/lib/pipeline/rte_swx_ctl.c
+++ b/lib/pipeline/rte_swx_ctl.c
@@ -123,12 +123,26 @@ struct selector {
struct rte_swx_table_selector_params params;
 };
 
+struct learner {
+   struct rte_swx_ctl_learner_info info;
+   struct rte_swx_ctl_table_match_field_info *mf;
+   struct rte_swx_ctl_table_action_info *actions;
+   uint32_t action_data_size;
+
+   /* The pending default action: this is NOT the current default action;
+* this will be the new default action after the next commit, if the
+* next commit operation is successful.
+*/
+   struct rte_swx_table_entry *pending_default;
+};
+
 struct rte_swx_ctl_pipeline {
struct rte_swx_ctl_pipeline_info info;
struct rte_swx_pipeline *p;
struct action *actions;
struct table *tables;
struct selector *selectors;
+   struct learner *learners;
struct rte_swx_table_state *ts;
struct rte_swx_table_state *ts_next;
int numa_node;
@@ -924,6 +938,70 @@ selector_params_get(struct rte_swx_ctl_pipeline *ctl, 
uint32_t selector_id)
return 0;
 }
 
+static void
+learner_pending_default_free(struct learner *l)
+{
+   if (!l->pending_default)
+   return;
+
+   free(l->pending_default->action_data);
+   free(l->pending_default);
+   l->pending_default = NULL;
+}
+
+
+static void
+learner_free(struct rte_swx_ctl_pipeline *ctl)
+{
+   uint32_t i;
+
+   if (!ctl->learners)
+   return;
+
+   for (i = 0; i < ctl->info.n_learners; i++) {
+   struct learner *l = &ctl->learners[i];
+
+   free(l->mf);
+   free(l->actions);
+
+   learner_pending_default_free(l);
+   }
+
+   free(ctl->learners);
+   ctl->learners = NULL;
+}
+
+static struct learner *
+learner_find(struct rte_swx_ctl_pipeline *ctl, const char *learner_name)
+{
+   uint32_t i;
+
+   for (i = 0; i < ctl->info.n_learners; i++) {
+   struct learner *l = &ctl->learners[i];
+
+   if (!strcmp(learner_name, l->info.name))
+   return l;
+   }
+
+   return NULL;
+}
+
+static uint32_t
+learner_action_data_size_get(struct rte_swx_ctl_pipeline *ctl, struct learner 
*l)
+{
+   uint32_t action_data_size = 0, i;
+
+   for (i = 0; i < l->info.n_actions; i++) {
+   uint32_t action_id = l->actions[i].action_id;
+   struct action *a = &ctl->actions[action_id];
+
+   if (a->data_size > action_data_size)
+   action_data_size = a->data_size;
+   }
+
+   return action_data_size;
+}
+
 static void
 table_state_free(struct rte_swx_ctl_pipeline *ctl)
 {
@@ -954,6 +1032,14 @@ table_state_free(struct rte_swx_ctl_pipeline *ctl)
rte_swx_table_selector_free(ts->obj);
}
 
+   /* For each learner table, free its table state. */
+   for (i = 0; i < ctl->info.n_learners; i++) {
+   struct rte_swx_table_state *ts = &ctl->ts_next[i];
+
+   /* Default action data. */
+   free(ts->default_action_data);
+   }
+
free(ctl->ts_next);
ctl->ts_next = NULL;
 }
@@ -1020,6 +1106,29 @@ table_state_create(struct rte_swx_ctl_pipeline *ctl)
}
}
 
+   /* Learner tables. */
+   for (i = 0; i < ctl->info.n_learners; i++) {
+   struct learner *l = &ctl->learners[i];
+   struct rte_swx_table_state *ts = &ctl->ts[i];
+   struct rte_swx_table_state *ts_next = &ctl->ts_next[i];
+
+   /* Table object: duplicate from the current table state. */
+   ts_next->obj = ts->obj;
+
+   /* Default action data: duplicate from the current table state. 
*/
+   ts_next->default_action_data = malloc(l->action_data_size);
+   if (!ts_next->default_action_data) {
+   status = -ENOMEM;
+   goto error;
+   }
+
+   memcpy(ts_next->default_action_data,
+  ts->default_action_data,
+

[dpdk-dev] [PATCH V5 4/4] examples/pipeline: add learner table example

2021-09-20 Thread Cristian Dumitrescu

Added the files to illustrate the learner table usage.

Signed-off-by: Cristian Dumitrescu 
---

V2: Added description to the .spec file.

 examples/pipeline/examples/learner.cli  |  37 +++
 examples/pipeline/examples/learner.spec | 127 
 2 files changed, 164 insertions(+)
 create mode 100644 examples/pipeline/examples/learner.cli
 create mode 100644 examples/pipeline/examples/learner.spec

diff --git a/examples/pipeline/examples/learner.cli 
b/examples/pipeline/examples/learner.cli
new file mode 100644
index 00..af7792624f
--- /dev/null
+++ b/examples/pipeline/examples/learner.cli
@@ -0,0 +1,37 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2020 Intel Corporation
+
+;
+; Customize the LINK parameters to match your setup.
+;
+mempool MEMPOOL0 buffer 2304 pool 32K cache 256 cpu 0
+
+link LINK0 dev :18:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK1 dev :18:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK2 dev :3b:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK3 dev :3b:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+
+;
+; PIPELINE0 setup.
+;
+pipeline PIPELINE0 create 0
+
+pipeline PIPELINE0 port in 0 link LINK0 rxq 0 bsz 32
+pipeline PIPELINE0 port in 1 link LINK1 rxq 0 bsz 32
+pipeline PIPELINE0 port in 2 link LINK2 rxq 0 bsz 32
+pipeline PIPELINE0 port in 3 link LINK3 rxq 0 bsz 32
+
+pipeline PIPELINE0 port out 0 link LINK0 txq 0 bsz 32
+pipeline PIPELINE0 port out 1 link LINK1 txq 0 bsz 32
+pipeline PIPELINE0 port out 2 link LINK2 txq 0 bsz 32
+pipeline PIPELINE0 port out 3 link LINK3 txq 0 bsz 32
+pipeline PIPELINE0 port out 4 sink none
+
+pipeline PIPELINE0 build ./examples/pipeline/examples/learner.spec
+
+;
+; Pipelines-to-threads mapping.
+;
+thread 1 pipeline PIPELINE0 enable
+
+; Once the application has started, the command to get the CLI prompt is: 
telnet 0.0.0.0 8086
diff --git a/examples/pipeline/examples/learner.spec 
b/examples/pipeline/examples/learner.spec
new file mode 100644
index 00..d635422282
--- /dev/null
+++ b/examples/pipeline/examples/learner.spec
@@ -0,0 +1,127 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2020 Intel Corporation
+
+; The learner tables are very useful for learning and connection tracking.
+;
+; As opposed to regular tables, which are read-only for the data plane, the 
learner tables can be
+; updated by the data plane without any control plane intervention. The 
"learning" process typically
+; takes place by having the default action (i.e. the table action which is 
executed on lookup miss)
+; explicitly add to the table with a specific action the key that just missed 
the lookup operation.
+; Each table key expires automatically after a configurable timeout period if 
not hit during this
+; interval.
+;
+; This example demonstrates a simple connection tracking setup, where the 
connections are identified
+; by the IPv4 destination address. The forwarding action assigned to each new 
connection gets the
+; output port as argument, with the output port of each connection generated 
by a counter that is
+; persistent between packets. On top of the usual table stats, the learner 
table stats include the
+; number of packets with learning related events.
+
+//
+// Headers
+//
+struct ethernet_h {
+   bit<48> dst_addr
+   bit<48> src_addr
+   bit<16> ethertype
+}
+
+struct ipv4_h {
+   bit<8> ver_ihl
+   bit<8> diffserv
+   bit<16> total_len
+   bit<16> identification
+   bit<16> flags_offset
+   bit<8> ttl
+   bit<8> protocol
+   bit<16> hdr_checksum
+   bit<32> src_addr
+   bit<32> dst_addr
+}
+
+header ethernet instanceof ethernet_h
+header ipv4 instanceof ipv4_h
+
+//
+// Meta-data
+//
+struct metadata_t {
+   bit<32> port_in
+   bit<32> port_out
+
+   // Arguments for the "fwd_action" action.
+   bit<32> fwd_action_arg_port_out
+}
+
+metadata instanceof metadata_t
+
+//
+// Registers.
+//
+regarray counter size 1 initval 0
+
+//
+// Actions
+//
+struct fwd_action_args_t {
+   bit<32> port_out
+}
+
+action fwd_action args instanceof fwd_action_args_t {
+   mov m.port_out t.port_out
+   return
+}
+
+action learn_action args none {
+   // Read current counter value into m.fwd_action_arg_port_out.
+   regrd m.fwd_action_arg_port_out counter 0
+
+   // Increment the counter.
+   regadd counter 0 1
+
+   // Limit the output port values to 0 .. 3.
+   and m.fwd_action_arg_port_out 3
+
+   // Add the current lookup key to the table with fwd_action as the key 
action. The action
+   // arguments are read from the packet meta-data (the 
m.fwd_action_arg_port_out field). These
+   // packet meta-data fields have to be written before the "learn" 
instruction is invoked.
+   learn fwd_action
+
+   // Send the current packet to the same output port.
+   mov m.port_out m.fwd_action_arg_port_out
+
+   return
+}
+
+//
+// Ta

Re: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary process

2021-09-20 Thread Loftus, Ciara

> 
> On Mon, 20 Sep 2021 13:23:57 +
> "Loftus, Ciara"  wrote:
> 
> > > -Original Message-
> > > From: dev  On Behalf Of Stephen Hemminger
> > > Sent: Friday 3 September 2021 17:15
> > > To: dev@dpdk.org
> > > Cc: Stephen Hemminger ;
> > > sta...@dpdk.org; xiaolong...@intel.com
> > > Subject: [dpdk-dev] [PATCH] net/af_xdp: fix support of secondary
> process
> > >
> > > Doing basic operations like info_get or get_stats was broken
> > > in af_xdp PMD. The info_get would crash because dev->device
> > > was NULL in secondary process. Fix this by doing same initialization
> > > as af_packet and tap devices.
> > >
> > > The get_stats would crash because the XDP socket is not open in
> > > primary process. As a workaround don't query kernel for dropped
> > > packets when called from secondary process.
> > >
> > > Note: this does not address the other bug which is that transmitting
> > > in secondary process is broken because the send() in tx_kick
> > > will fail because XDP socket fd is not valid in secondary process.
> >
> > Hi Stephen,
> >
> > Apologies for the delayed reply, I was on vacation.
> >
> > In the Bugzilla report you suggest we:
> > "mark AF_XDP as broken in with primary/secondary
> > and return an error in probe in secondary process".
> > I agree with this suggestion. However with this patch we still permit
> secondary, and just make sure it doesn't crash for get_stats. Did you change
> your mind?
> > Personally, I would prefer to have primary/secondary either working 100%
> or else not allowed at all by throwing an error during probe. What do you
> think? Do you have a reason/use case to permit secondary processes despite
> some features not being available eg. full stats, tx?
> >
> > Thanks,
> > Ciara
> 
> There are two cases where secondary is useful even if send/receive can't
> work from secondary process.
> The pdump and proc-info applications can work with these patches.
> 
> I am using XDP over pdump as an easy way to get packets into the code for
> testing.
> 
> The flag in the documentation doesn't have a "limited" version.
> If you want, will send another patch to disable secondary support.

Thanks for explaining. Since there are use cases for secondary, even if the 
functionality is limited, I don't think it should be disabled.
Since we can't flag it as 'limited' in the feature matrix, could you please add 
a note about the send/receive limitation in the AF_XDP PMD documentation in a 
v2? There are already a number of limitations listed, which you can add to.

Thanks,
Ciara

> 
> Supporting secondary, means adding a mechanism to pass the socket
> around.

Re: [dpdk-dev] [PATCH] net/memif: fix chained mbuf determination

2021-09-20 Thread Ferruh Yigit

On 9/9/2021 3:42 PM, Junxiao Shi wrote:
> Previously, TX functions call rte_pktmbuf_is_contiguous to determine
> whether an mbuf is chained. However, rte_pktmbuf_is_contiguous is
> designed to work on the first mbuf of a packet only. In case a packet
> contains three or more segment mbufs in a chain, it may cause truncated
> packets or rte_mbuf_sanity_check panics.
> 
> This patch updates TX functions to determine chained mbufs using
> mbuf_head->nb_segs field, which works in all cases. Moreover, it
> maintains that the second cacheline is only accessed when chained mbuf
> is actually present.
> 
> Signed-off-by: Junxiao Shi 

+ memif maintainer, Jakub.

Jakub, can you please review the patch?

Re: [dpdk-dev] [PATCH V3 01/24] pipeline: move data structures to internal header file

2021-09-20 Thread Dumitrescu, Cristian



> Depends-on: series-18297 ("[V4,1/4] table: add support learner tables")

Just sent an updated version for the learner table series, which does not 
result in any code changes for this patch set. Therefore, the updated 
dependency list of this patch set is:

Depends-on: series-19048 ("[V5,1/4] table: add support learner tables")

Thanks,
Cristian

Re: [dpdk-dev] [PATCH v2] net/af_packet: reinsert the stripped vlan tag

2021-09-20 Thread Ferruh Yigit

On 9/8/2021 9:59 AM, Tudor Cornea wrote:
> The af_packet pmd driver binds to a raw socket and allows
> sending and receiving of packets through the kernel.
> 
> Since commit [1], the kernel strips the vlan tags early in
> __netif_receive_skb_core(), so we receive untagged packets while
> running with the af_packet pmd.
> 
> Luckily for us, the skb vlan-related fields are still populated from the
> stripped vlan tags, so we end up having all the information
> that we need in the mbuf.
> 
> Having the PMD driver support DEV_RX_OFFLOAD_VLAN_STRIP allows the
> application to control the desired vlan stripping behavior.
> 
> [1] 
> https://github.com/torvalds/linux/commit/bcc6d47903612c3861201cc3a866fb604f26b8b2
> 
> Signed-off-by: Tudor Cornea 
> 

Hi Tudor,

The concern was unexpected performance degradation (user not setting any offload
will have performance drop). But since your measurements show no significant
drop, I think it is fair to make driver behave same as other drivers.
(Until we have a way to describe offloads that can't be disabled by PMDs.)

Can you do a few minor updates:
- Put your performance measurements into to the commit log to record them
- Update the af_packet documentation (doc/guides/nics/af_packet.rst) to document
PMD behavior with packets with VLAN tag
- Update release note (doc/guides/rel_notes/release_21_11.rst) with a one/two
sentences to document the change, to notify possible users of the af_packet with
the change.

Thanks,
ferruh

> ---
> v2:
> * Add DEV_RX_OFFLOAD_VLAN_STRIP to rxmode->offloads
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
> b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..5ed9dd6 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -48,6 +48,7 @@ struct pkt_rx_queue {
>  
>   struct rte_mempool *mb_pool;
>   uint16_t in_port;
> + uint8_t vlan_strip;
>  
>   volatile unsigned long rx_pkts;
>   volatile unsigned long rx_bytes;
> @@ -78,6 +79,7 @@ struct pmd_internals {
>  
>   struct pkt_rx_queue *rx_queue;
>   struct pkt_tx_queue *tx_queue;
> + uint8_t vlan_strip;
>  };
>  
>  static const char *valid_arguments[] = {
> @@ -148,6 +150,9 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   if (ppd->tp_status & TP_STATUS_VLAN_VALID) {
>   mbuf->vlan_tci = ppd->tp_vlan_tci;
>   mbuf->ol_flags |= (PKT_RX_VLAN | PKT_RX_VLAN_STRIPPED);
> +
> + if (!pkt_q->vlan_strip && rte_vlan_insert(&mbuf))
> + PMD_LOG(ERR, "Failed to reinsert VLAN tag");
>   }
>  
>   /* release incoming frame and advance ring buffer */
> @@ -302,6 +307,11 @@ eth_dev_stop(struct rte_eth_dev *dev)
>  static int
>  eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
>  {
> + struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
> + const struct rte_eth_rxmode *rxmode = &dev_conf->rxmode;
> + struct pmd_internals *internals = dev->data->dev_private;
> +
> + internals->vlan_strip = !!(rxmode->offloads & 
> DEV_RX_OFFLOAD_VLAN_STRIP);
>   return 0;
>  }
>  
> @@ -318,6 +328,7 @@ eth_dev_info(struct rte_eth_dev *dev, struct 
> rte_eth_dev_info *dev_info)
>   dev_info->min_rx_bufsize = 0;
>   dev_info->tx_offload_capa = DEV_TX_OFFLOAD_MULTI_SEGS |
>   DEV_TX_OFFLOAD_VLAN_INSERT;
> + dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP;
>  
>   return 0;
>  }
> @@ -448,6 +459,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>  
>   dev->data->rx_queues[rx_queue_id] = pkt_q;
>   pkt_q->in_port = dev->data->port_id;
> + pkt_q->vlan_strip = internals->vlan_strip;
>  
>   return 0;
>  }
>

Re: [dpdk-dev] [PATCH] eal: add telemetry callbacks for memory info

2021-09-20 Thread Bruce Richardson

On Wed, Sep 15, 2021 at 03:23:36PM +0530, Harman Kalra wrote:
> Registering new telemetry callbacks to dump named (memzones)
> and unnamed (malloc) memory information to a file provided as
> an argument.
> 
> Example:
> Connecting to /var/run/dpdk/rte/dpdk_telemetry.v2
> {"version": "DPDK 21.08.0", "pid": 34075, "max_output_len": 16384}
> Connected to application: "dpdk-testpmd"
> --> /eal/malloc_dump,/tmp/malloc_dump
> {"/eal/malloc_dump": {"Malloc elements file: ": "/tmp/malloc_dump"}}
> -->
> --> /eal/malloc_info,/tmp/info
> {"/eal/malloc_info": {"Malloc stats file: ": "/tmp/info"}}
> -->
> -->
> --> /eal/memzone_dump,/tmp/memzone_info
> {"/eal/memzone_dump": {"Memzones count: ": 11, \
> "Memzones info file: ": "/tmp/memzone_info"}}
> 
> Signed-off-by: Harman Kalra 
> ---

For this info, why not just send the data out as telemetry data rather than
writing files on the filesystem containing it? If the info is too large to
dump it all in a single go, a shortened form could be sent via some form of
list call, and additional calls could be used to provide more detail on
specific items in the list.

 Also, this seems more a debugging operation than a telemetry one, though I
don't have a strong objection to the info being exported as telemetry
directly (just not via filesystem).

Regards,
/Bruce

Re: [dpdk-dev] [PATCH 1/2] app/testpmd: add tunnel types

2021-09-20 Thread Ferruh Yigit

On 9/13/2021 3:25 PM, Eli Britstein wrote:
> Current testpmd implementation supports VXLAN only for tunnel offload.
> Add GRE, NVGRE and GENEVE for tunnel offload flow matches.
> 

Hi Eli,

I assume tunnel types are added, but forgot to add the flow tunnel support for
them, so this patch is fixing it. If so can you please add the fixes commits?

Also it may help to give a sample of the enabled commands in the commit log, to
record.

Thanks,
ferruh

> Signed-off-by: Eli Britstein 
> ---
>  app/test-pmd/config.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 31d8ba1b91..fba388da5c 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -1212,6 +1212,15 @@ port_flow_tunnel_type(struct rte_flow_tunnel *tunnel)
>   case RTE_FLOW_ITEM_TYPE_VXLAN:
>   type = "vxlan";
>   break;
> + case RTE_FLOW_ITEM_TYPE_GRE:
> + type = "gre";
> + break;
> + case RTE_FLOW_ITEM_TYPE_NVGRE:
> + type = "nvgre";
> + break;
> + case RTE_FLOW_ITEM_TYPE_GENEVE:
> + type = "geneve";
> + break;
>   }
>  
>   return type;
> @@ -1272,6 +1281,12 @@ void port_flow_tunnel_create(portid_t port_id, const 
> struct tunnel_ops *ops)
>  
>   if (!strcmp(ops->type, "vxlan"))
>   type = RTE_FLOW_ITEM_TYPE_VXLAN;
> + else if (!strcmp(ops->type, "gre"))
> + type = RTE_FLOW_ITEM_TYPE_GRE;
> + else if (!strcmp(ops->type, "nvgre"))
> + type = RTE_FLOW_ITEM_TYPE_NVGRE;
> + else if (!strcmp(ops->type, "geneve"))
> + type = RTE_FLOW_ITEM_TYPE_GENEVE;
>   else {
>   fprintf(stderr, "cannot offload \"%s\" tunnel type\n",
>   ops->type);
>

Re: [dpdk-dev] [PATCH] net/af_packet: fix ignoring full ring on tx

2021-09-20 Thread Ferruh Yigit

On 9/6/2021 11:23 AM, Tudor Cornea wrote:
> Hi Ferruh,
> 
> Would you mind separate timestamp status fix to its own patch? I think
>> better to
>> fix 'ignoring full Tx ring' first, to not make it dependent to timestamp
>> patch.
> 
> 
> Agreed. There are two issues solved by this patch. We will break it in two
> different patches.
> 
> I can see 'TP_STATUS_TS_SYS_HARDWARE' is deprecated, and I assume in the
>> kernel
>> versions the bug exists, this flag is not set, but can you please confirm?
> 
> 
> And does it only seen with veth, if so I wonder if we can ignore it, not
>> sure
>> how common to use af_packet PMD over veth interface, do you have this
>> usecase?
> 
> 
> We've seen the timestamping issue only when running af_packet over
> veth interfaces. We have a particular use-case internally, in which we need
> to run inside a Kubernetes cluster.
> We've found the following resources [1] , [2] related to this behavior in
> the kernel.
> 
> We believe that issue #2 (the ring getting full), can theoretically occur
> on any type of NIC.
> We managed to reproduce the bursty behavior on af_packet PMD over vmxnet3
> interface, by Tx-ing packets at a low rate (e.g ~340 pps), and toggling the
> interface on / off
> ifconfig $iface_name down; sleep 10; ifconfig $iface_name up
> 
> We will attempt to give more context on the issue below, about what we
> think happens:
> - we have a 2048 queue shared between the kernel and the dpdk socket,
> there's an index the queue in both the kernel and the dpdk driver
> - the dpdk driver writes a packet or a burst, advances its idx and tells
> the kernel to send the packets via a call to sendto() and the kernel sends
> the packets and advances its idx
> - once the interface is down the kernel can no longer send packets, but it
> doesn't drop them, it just doesn't advance its idx
> - for each packet there is header and in the header there is a status
> integer which, among others, indicates the owner of the packet: the
> userspace or the kernel - the userspace (dpdk driver) sets the status as
> owned by the kernel when it adds another packet ; the kernel sets the
> status back to owned by the userspace once it sends a packet
> - the dpdk driver was ignoring this status bit and,  even after the queue
> was full, it would continue to put packets in the queue - its idx would be
> "after" that of the kernel
> - once the interface is brought up, the kernel would send all the packets
> in the queue (since they have the status of being owned by the kernel) on
> the next call to sendto() and the idx would be back to where it was before
> the interface was brought up (let's call it k1)
> - the dpdk driver idx into the queue would point somewhere in the queue
> (let's call it d1) and would continue to add packets at that point, but the
> kernel wouldn't send any packet anymore since there is now a gap of packets
> owned by the userspace between the kernel index (k1) and the dpdk driver
> idx (d1)
> - the dpdk idx would eventually reach k1 and packets would be transferred
> at a normal rate until both the dpdk idx and the kernel idx would reach d1
> again
> - between d1 and k1 there are only packets with the status as owned by the
> kernel - which where added by the dpdk driver while its index was between
> d1 and k1 ; thus the kernel would burst all the packets till k1, while the
> dpdk idx is at d1
> - the cycle repeats
> 
> If a new traffic config comes (in our application) while this cycle is
> happening, it could be that some of the packets of the old config are still
> in queue (between d1 and k1) and will be bursted when the dpdk and kernel
> idx reach d1 ; this would explain seeing packets from an old config, but
> only in the first 2048 packets (which is the queue size)
> 
> 

Hi Tudor,

If there is an usage on of veth, OK to fix the timestamps issue.

What you described above looks like a ring buffer with single producer and
single consumer, and producer overwrites the not consumed items.

I assume this happens because af_packet (consumer) can't send the packets
because of the timestamp defect. (Also producer (dpdk app) should have checks to
prevent overwrite, but that is a different issue.)

I will comment to the new versions of the patches.

Our of curiosity, are you using an modified af_packet implementation in kernel
for above described usage?

> [1] https://www.spinics.net/lists/kernel/msg3959391.html
> [2] https://www.spinics.net/lists/netdev/msg739372.html
> 
> On Wed, 1 Sept 2021 at 19:34, Ferruh Yigit  wrote:
> 
>> On 8/20/2021 2:39 PM, Tudor Cornea wrote:
>>> The poll call can return POLLERR which is ignored, or it can return
>>> POLLOUT, even if there are no free frames in the mmap-ed area.
>>>
>>> We can account for both of these cases by re-checking if the next
>>> frame is empty before writing into it.
>>>
>>> We also now eliminate the timestamp status from the frame status.
>>>
>>
>> Hi Tudor,
>>
>> Would you mind separate timestamp status fix to its own patch

Re: [dpdk-dev] [PATCH v2] net/af_packet: fix ignoring full ring on tx

2021-09-20 Thread Ferruh Yigit

On 9/13/2021 2:45 PM, Tudor Cornea wrote:
> The poll call can return POLLERR which is ignored, or it can return
> POLLOUT, even if there are no free frames in the mmap-ed area.
> 
> We can account for both of these cases by re-checking if the next
> frame is empty before writing into it.
> 
> Signed-off-by: Mihai Pogonaru 
> Signed-off-by: Tudor Cornea 
> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
> b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..087c196 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -216,6 +216,25 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   (poll(&pfd, 1, -1) < 0))
>   break;
>  
> + /*
> +  * Poll can return POLLERR if the interface is down
> +  *
> +  * It will almost always return POLLOUT, even if there
> +  * are no extra buffers available
> +  *
> +  * This happens, because packet_poll() calls datagram_poll()
> +  * which checks the space left in the socket buffer and,
> +  * in the case of packet_mmap, the default socket buffer length
> +  * doesn't match the requested size for the tx_ring.
> +  * As such, there is almost always space left in socket buffer,
> +  * which doesn't seem to be correlated to the requested size
> +  * for the tx_ring in packet_mmap.
> +  *
> +  * This results in poll() returning POLLOUT.
> +  */
> + if (ppd->tp_status != TP_STATUS_AVAILABLE)
> + break;
> +

If 'POLLOUT' doesn't indicate that there is space in the buffer, what is the
point of the 'poll()' at all?

What can we test/reproduce the mentioned behavior? Or is there a way to fix the
behavior of poll() or use an alternative of it?


OK to break on the 'POLLERR', I guess it can be detected in the 'pfd.revent'.


>   /* copy the tx frame data */
>   pbuf = (uint8_t *) ppd + TPACKET2_HDRLEN -
>   sizeof(struct sockaddr_ll);
>

Re: [dpdk-dev] [PATCH v2] net/af_packet: remove timestamp from packet status

2021-09-20 Thread Ferruh Yigit

On 9/13/2021 6:23 PM, Tudor Cornea wrote:
> We should eliminate the timestamp status from the packet
> status. This should only matter if timestamping is enabled
> on the socket, but we might hit a kernel bug, which is fixed
> in newer releases.
> 
> For interfaces of type 'veth', the sent skb is forwarded
> to the peer and back into the network stack which timestamps
> it on the RX path if timestamping is enabled globally
> (which happens if any socket enables timestamping).
> 
> When the skb is destructed, tpacket_destruct_skb() is called
> and it calls __packet_set_timestamp() which doesn't check
> the flags on the socket and returns the timestamp if it is
> set in the skb (and for veth it is, as mentioned above).
> 
> See the following kernel commit for reference [1]:
> 
> net: packetmmap: fix only tx timestamp on request
> 
> The packetmmap tx ring should only return timestamps if requested
> via setsockopt PACKET_TIMESTAMP, as documented. This allows
> compatibility with non-timestamp aware user-space code which checks
> tp_status == TP_STATUS_AVAILABLE; not expecting additional timestamp
> flags to be set in tp_status.
> 
> [1] https://www.spinics.net/lists/kernel/msg3959391.html
> 
> Signed-off-by: Mihai Pogonaru 
> Signed-off-by: Tudor Cornea 
> 
> ---
> v2:
> * Remove compile-time check for kernel version

OK, Stephen's comment makes sense.

> ---
>  drivers/net/af_packet/rte_eth_af_packet.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
> b/drivers/net/af_packet/rte_eth_af_packet.c
> index b73b211..7ecea4e 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -167,6 +167,22 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   return num_rx;
>  }
>  
> +static inline bool tx_ring_status_unavailable(uint32_t tp_status)
> +{

Minor syntax comment, can you have the 'static inline bool' part in separate
line. And a basic function comment can be good.

Thanks,
ferruh

> + /*
> +  * We eliminate the timestamp status from the packet status.
> +  * This should only matter if timestamping is enabled on the socket,
> +  * but there is a bug in the kernel which is fixed in newer releases.
> +  *
> +  * See the following kernel commit for reference:
> +  * commit 171c3b151118a2fe0fc1e2a9d1b5a1570cfe82d2
> +  * net: packetmmap: fix only tx timestamp on request
> +  */
> + tp_status &= ~(TP_STATUS_TS_SOFTWARE | TP_STATUS_TS_RAW_HARDWARE);
> +
> + return tp_status != TP_STATUS_AVAILABLE;
> +}
> +
>  /*
>   * Callback to handle sending packets through a real NIC.
>   */
> @@ -212,8 +228,8 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
> uint16_t nb_pkts)
>   }
>  
>   /* point at the next incoming frame */
> - if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
> - (poll(&pfd, 1, -1) < 0))
> + if (tx_ring_status_unavailable(ppd->tp_status) &&
> + poll(&pfd, 1, -1) < 0)
>   break;
>  
>   /* copy the tx frame data */
>

Re: [dpdk-dev] [PATCH v2 02/15] crypto: add total raw buffer length

2021-09-20 Thread Akhil Goyal

> 
> > From: Gagandeep Singh 
> >
> > The current crypto raw data vectors is extended to support
> > rte_security usecases, where we need total data length to know
> > how much additional memory space is available in buffer other
> > than data length so that driver/HW can write expanded size
> > data after encryption.
> >
> > Signed-off-by: Gagandeep Singh 
> > Acked-by: Akhil Goyal 
> > ---
> >  lib/cryptodev/rte_crypto_sym.h | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/lib/cryptodev/rte_crypto_sym.h
> b/lib/cryptodev/rte_crypto_sym.h
> > index dcc0bd5933..e5cef1fb72 100644
> > --- a/lib/cryptodev/rte_crypto_sym.h
> > +++ b/lib/cryptodev/rte_crypto_sym.h
> > @@ -37,6 +37,8 @@ struct rte_crypto_vec {
> > rte_iova_t iova;
> > /** length of the data buffer */
> > uint32_t len;
> > +   /** total buffer length*/
> > +   uint32_t tot_len;
> >  };
> >
> >  /**
> > @@ -980,12 +982,14 @@ rte_crypto_mbuf_to_vec(const struct rte_mbuf
> *mb, uint32_t ofs, uint32_t len,
> > seglen = mb->data_len - ofs;
> > if (len <= seglen) {
> > vec[0].len = len;
> > +   vec[0].tot_len = mb->buf_len;
> 
> That doesn't look right.
> We should take into a count mbuf headroom and input offset.
> Something like:
> vec[0].tot_len = mb->buf_len - rte_pktmbuf_headroom(m) - ofs;
> Same in other places below.
> 
I believe the packet can expand into headroom based on the protocol support.

Re: [dpdk-dev] [PATCH v3] efd: change data type of parameter

2021-09-20 Thread David Christensen





On 9/17/21 5:56 AM, Pablo de Lara wrote:

rte_efd_create() function was using uint8_t for a socket bitmask,
for one of its parameters.
This limits the maximum of NUMA sockets to be 8.
Changing to uint64_t increases it to 64, which should be
more future-proof.

Coverity issue: 366390
Fixes: 56b6ef874f8 ("efd: new Elastic Flow Distributor library")

Signed-off-by: Pablo de Lara 
Acked-by: John McNamara 
---

v3: Fixed commit message

v2: Fixed EFD tests

---

  app/test/test_efd.c  | 4 ++--
  app/test/test_efd_perf.c | 4 ++--
  lib/efd/rte_efd.c| 2 +-
  lib/efd/rte_efd.h| 2 +-
  4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/app/test/test_efd.c b/app/test/test_efd.c
index 180dc4748e..581519c1e0 100644
--- a/app/test/test_efd.c
+++ b/app/test/test_efd.c
@@ -91,9 +91,9 @@ static struct flow_key keys[5] = {
  /* Array to store the data */
  static efd_value_t data[5];

-static inline uint8_t efd_get_all_sockets_bitmask(void)
+static inline uint64_t efd_get_all_sockets_bitmask(void)
  {
-   uint8_t all_cpu_sockets_bitmask = 0;
+   uint64_t all_cpu_sockets_bitmask = 0;
unsigned int i;
unsigned int next_lcore = rte_get_main_lcore();
const int val_true = 1, val_false = 0;
diff --git a/app/test/test_efd_perf.c b/app/test/test_efd_perf.c
index 1c47704475..f3fe3b1736 100644
--- a/app/test/test_efd_perf.c
+++ b/app/test/test_efd_perf.c
@@ -29,9 +29,9 @@
  #endif
  static unsigned int test_socket_id;

-static inline uint8_t efd_get_all_sockets_bitmask(void)
+static inline uint64_t efd_get_all_sockets_bitmask(void)
  {
-   uint8_t all_cpu_sockets_bitmask = 0;
+   uint64_t all_cpu_sockets_bitmask = 0;
unsigned int i;
unsigned int next_lcore = rte_get_main_lcore();
const int val_true = 1, val_false = 0;
diff --git a/lib/efd/rte_efd.c b/lib/efd/rte_efd.c
index 77f46809f8..68a2378e88 100644
--- a/lib/efd/rte_efd.c
+++ b/lib/efd/rte_efd.c
@@ -495,7 +495,7 @@ efd_search_hash(struct rte_efd_table * const table,

  struct rte_efd_table *
  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
-   uint8_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket)
+   uint64_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket)
  {
struct rte_efd_table *table = NULL;
uint8_t *key_array = NULL;
diff --git a/lib/efd/rte_efd.h b/lib/efd/rte_efd.h
index c2be4c09ae..d3d7befd0c 100644
--- a/lib/efd/rte_efd.h
+++ b/lib/efd/rte_efd.h
@@ -139,7 +139,7 @@ typedef uint16_t efd_hashfunc_t;
   */
  struct rte_efd_table *
  rte_efd_create(const char *name, uint32_t max_num_rules, uint32_t key_len,
-   uint8_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);
+   uint64_t online_cpu_socket_bitmask, uint8_t offline_cpu_socket);

  /**
   * Releases the resources from an EFD table



After applying the patch I receive a segmentation fault when I use 
lcores on the second NUMA node (node 8).


$ lscpu
Architecture:ppc64le
..
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127

Working case:
---
$ sudo /home/drc/src/dpdk/build/app/test/dpdk-test -l 59-63 -n 4 --no-pci
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
APP: HPET is not enabled, using TSC as default timer
RTE>>efd_autotest
Entering test_add_delete
Entering test_efd_find_existing
Entering test_add_update_delete
Entering test_five_keys
Entering test_efd_creation_with_bad_parameters, **Errors are expected **
EFD: Allocating key array on socket 0 failed
EFD: At least one CPU socket must be enabled in the bitmask
EFD: Allocating EFD table management structure on socket 255 failed
# Test successful. No more errors expected
Evaluating table utilization and correctness, please wait
Added2097152Succeeded2097152Lost  0
Added2097152Succeeded2097152Lost  0
Added2097152Succeeded2097152Lost  0

Average table utilization = 100.00% (2097152/2097152)
Test OK
RTE>>quit

Failing case:
---
sudo /home/drc/src/dpdk/build/app/test/dpdk-test -l 64-69 -n 4 --no-pci
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
APP: HPET is not enabled, using TSC as default timer
RTE>>efd_autotest
Entering test_add_delete
Segmentation fault

What's the purpose of "test_socket_id" in the file app/test/test_efd.c? 
 I don't see it set during the test, default to 0, and it looks like it 
should be 8 in this situation:


sudo gdb --args /home/drc/src/dpdk/build/app/test/dpdk-test -l 64-69 -n 
4 --no-pci

GNU gdb (GDB)

Re: [dpdk-dev] [PATCH] Enable AddressSanitizer feature on DPDK

2021-09-20 Thread David Christensen

On 9/18/21 12:21 AM, Peng, ZhihongX wrote:

-Original Message-
From: David Christensen 
Sent: Saturday, September 18, 2021 4:51 AM
To: Peng, ZhihongX ; Burakov, Anatoly
; Ananyev, Konstantin
; step...@networkplumber.org
Cc: dev@dpdk.org; Lin, Xueqin 
Subject: Re: [dpdk-dev] [PATCH] Enable AddressSanitizer feature on DPDK

If you want to use this feature,
you need to add below compilation options when compiling code:
-Dbuildtype=debug -Db_lundef=false -Db_sanitize=address
"-Dbuildtype=debug": Display code information when coredump occurs
in the program.
"-Db_lundef=false": It is enabled by default, and needs to be
disabled when using asan.

On initial inspection, it appears ASAN functionality doesn't work
with DPDK on PPC architecture.  I tested the patch with several
compiler versions (gcc
8.3.1 from RHEL 8.3 through gcc 11.2.1 from the IBM Advanced
Toolchain 15.0) and observed the following error when running testpmd

with ASAN enabled:

AddressSanitizer:DEADLYSIGNAL

==

===
==49246==ERROR: AddressSanitizer: SEGV on unknown address
0xa0077bd0 (pc 0x10b4eca4 bp 0x7fffe150 sp 0x7fffe150
T0) ==49246==The signal is caused by a UNKNOWN memory access.
   #0 0x10b4eca4 in

asan_set_shadow ../lib/eal/common/malloc_elem.h:120

   #1 0x10b4ed68 in

asan_set_zone ../lib/eal/common/malloc_elem.h:135

   #2 0x10b4ee90 in asan_clear_split_alloczone
../lib/eal/common/malloc_elem.h:162
   #3 0x10b51f84 in malloc_elem_alloc
../lib/eal/common/malloc_elem.c:477
...

Can you incorporate an exception for PPC architecture with this patch
while I look into the problem further?

Dave

We do not have a ppc platform, so there is no adaptation.
doc/guides/prog_guide/asan.rst has stated that we currently only
support Linux x86_64. You can adapt according to the following documents,

the main work is to modify the base address according to the platform.

Documents:
https://github.com/google/sanitizers/wiki/AddressSanitizer
https://github.com/llvm/llvm-project/tree/main/compiler-rt

Understand you don't have such a platform.  I looked into it and suggest the
following change in lib/eal/common/malloc_elem.h:

#define ASAN_SHADOW_GRAIN_SIZE  8
#define ASAN_SHADOW_SCALE   3
#ifdef RTE_ARCH_PPC_64
#define ASAN_SHADOW_OFFSET 0x0200 #else #define
ASAN_SHADOW_OFFSET 0x7fff8000 #endif
#define ASAN_MEM_FREE_FLAG  0xfd
#define ASAN_MEM_REDZONE_FLAG   0xfa
#define ASAN_MEM_TO_SHADOW(mem) (((mem) >>
ASAN_SHADOW_SCALE) +
ASAN_SHADOW_OFFSET)

This resolves the segmentation error I receive.

Dave

Great, good information for dpdk asan tool. Because we can't do many tests,
so when this patch is merged into the main line, you can submit the ppc
architecture patch.

If your argument is that this is x86 only then please ensure it can't be 
enabled on non-x86 platforms such as ARM and PPC.  I can then easily 
submit a follow-on patch to enable for PPC.

As the patch currently stands it enables ASAN on a non-tested platform 
and provides an unexpected error for some users when it can easily be 
avoided.  I'd advise not accepting the patch as currently presented.

Dave

Re: [dpdk-dev] [PATCH v9] eal: remove sys/queue.h from public headers

2021-09-20 Thread Narcisa Ana Maria Vasile

On Tue, Aug 24, 2021 at 04:21:03PM +, William Tu wrote:
> Currently there are some public headers that include 'sys/queue.h', which
> is not POSIX, but usually provided by the Linux/BSD system library.
> (Not in POSIX.1, POSIX.1-2001, or POSIX.1-2008. Present on the BSDs.)
> The file is missing on Windows. During the Windows build, DPDK uses a
> bundled copy, so building a DPDK library works fine.  But when OVS or other
> applications use DPDK as a library, because some DPDK public headers
> include 'sys/queue.h', on Windows, it triggers an error due to no such
> file.
> 
> One solution is to install the 'lib/eal/windows/include/sys/queue.h' into
> Windows environment, such as [1]. However, this means DPDK exports the
> functionalities of 'sys/queue.h' into the environment, which might cause
> symbols, macros, headers clashing with other applications.
> 
> The patch fixes it by removing the "#include " from
> DPDK public headers, so programs including DPDK headers don't depend
> on the system to provide 'sys/queue.h'. When these public headers use
> macros such as TAILQ_xxx, we replace it by the ones with RTE_ prefix.
> For Windows, we copy the definitions from  to rte_os.h
> in Windows EAL. Note that these RTE_ macros are compatible with
> , both at the level of API (to use with 
> macros in C files) and ABI (to avoid breaking it).
> 
> Additionally, the TAILQ_FOREACH_SAFE is not part of ,
> the patch replaces it with RTE_TAILQ_FOREACH_SAFE.
> 
> [1] http://mails.dpdk.org/archives/dev/2021-August/216304.html
> 
> Suggested-by: Nick Connolly 
> Suggested-by: Dmitry Kozliuk 
> Acked-by: Dmitry Kozliuk 
> Signed-off-by: William Tu 
> ---
Acked-by: Narcisa Vasile

Re: [dpdk-dev] I40e-dpdk_18.11-Bonding issue

2021-09-20 Thread Nishant Verma

Thanks for the reply.

Actually till this time the test has not even started. I have a INTEL
processor with 2 onboard nic.
b5:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for
10GbE backplane (rev 04)
b5:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for
10GbE backplane (rev 04)

I created 2 VF on each of the PF and whitelisted the VF's PCI to execute my
test app. I am using one VF from each to create a bond(ACTIVE BACKUP) that
will be used by my app.

EAL init is fine after that port configure is fine as of now. But after
bond creation when I am going for init, it gives me this error
mac_address_slaves_update(1495) - 1 Failed to update port Id 0 MAC address

and furter on port start fails.

Thanks.

Regards,
Nishant Verma

On Fri, Sep 17, 2021 at 8:51 PM Min Hu (Connor)  wrote:

> Hi, Nishant,
> could you tell us your complete detailed steps and logs for test ？
> the more detailed the better， thanks.
>
> 在 2021/9/18 1:20, Nishant Verma 写道:
> > Hi,
> >
> > I am stuck with a bonding issue wrt. DPDK 18.11 with X722 ethernet
> > controller.
> >
> > My system basically creates VF's through SRIOV on top of two PF's. One
> > issue I found was if I try to change the MAC address of VF more then
> once,
> > it won't let me do that. Anyway I can live with that but when I am
> creating
> > bond interface on top of 2 VF's. I am getting this error.
> >
> > Changed Address:06:00:00:02:b7:21
> > FWK: Bond interface net_bonding1 created successfully, id = 2
> >
> > *mac_address_slaves_update(1495) - Failed to update port Id 0 MAC
> address*FWK:
> > Adding slave (0) to bond (2)
> > mac_address_slaves_update(1495) - Failed to update port Id 0 MAC address
> > mac_address_slaves_update(1503) - Failed to update port Id 1 MAC address
> > FWK: Adding slave (1) to bond (2)
> > mac_address_slaves_update(1495) - Failed to update port Id 0 MAC address
> > mac_address_slaves_update(1503) - Failed to update port Id 1 MAC address
> > EAL: Error - exiting with code: 1
> >Cause: rte_eth_dev_start: err=-1, port=2
> >
> > I checked a whole lot of patches but everything seems to be in place in
> > this release. What else can I try to get rid of these errors?
> >
> > Thanks.
> >
> > Regards,
> > Nishant Verma
> > .
> >
>

Re: [dpdk-dev] [PATCH v2 0/6] mlx5: some independent fixes

2021-09-20 Thread Thomas Monjalon

12/09/2021 12:36, Michael Baum:
> Some independent fixes in mlx5 net and common driver.
> 
> v2: improve commit logs.
> 
> Michael Baum (6):
>   net/mlx5: fix memory leak in the SH creation
>   net/mlx5: fix memory leak in PCI probe
>   net/mlx5: fix allow duplicate pattern devarg default
>   common/mlx5: fix class combination validation
>   common/mlx5: fix device list operation concurrency
>   common/mlx5: fix resource cleanliness in a device remove

Applied in next-net-mlx, thanks.

Re: [dpdk-dev] [PATCH v2 1/8] common/cnxk: use different macros for sdp and lbk max frames

2021-09-20 Thread Jerin Jacob

On Sat, Sep 18, 2021 at 8:02 PM  wrote:
>
> From: Satha Rao 
>
> For SDP interface all platforms supports up to 65535 frame size.
> Updated api with new check for SDP interface.

Please change the subject to
common/cnxk: set appropriate max frame size for SDP and LBK
or so



>
> Signed-off-by: Satha Rao 
> ---
>  drivers/common/cnxk/hw/nix.h  | 1 +
>  drivers/common/cnxk/roc_nix.c | 5 -
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h
> index 6b86002..a0ffd25 100644
> --- a/drivers/common/cnxk/hw/nix.h
> +++ b/drivers/common/cnxk/hw/nix.h
> @@ -2102,6 +2102,7 @@ struct nix_lso_format {
>
>  #define NIX_CN9K_MAX_HW_FRS 9212UL
>  #define NIX_LBK_MAX_HW_FRS  65535UL
> +#define NIX_SDP_MAX_HW_FRS  65535UL
>  #define NIX_RPM_MAX_HW_FRS  16380UL
>  #define NIX_MIN_HW_FRS 60UL
>
> diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
> index 23d508b..d1e8c2d 100644
> --- a/drivers/common/cnxk/roc_nix.c
> +++ b/drivers/common/cnxk/roc_nix.c
> @@ -113,10 +113,13 @@
>  {
> struct nix *nix = roc_nix_to_nix_priv(roc_nix);
>
> +   if (roc_nix_is_sdp(roc_nix))
> +   return NIX_SDP_MAX_HW_FRS;
> +
> if (roc_model_is_cn9k())
> return NIX_CN9K_MAX_HW_FRS;
>
> -   if (nix->lbk_link || roc_nix_is_sdp(roc_nix))
> +   if (nix->lbk_link)
> return NIX_LBK_MAX_HW_FRS;
>
> return NIX_RPM_MAX_HW_FRS;
> --
> 1.8.3.1
>

Re: [dpdk-dev] [PATCH v2 2/8] common/cnxk: flush smq

2021-09-20 Thread Jerin Jacob

On Sat, Sep 18, 2021 at 8:02 PM  wrote:
>
> From: Satha Rao 
>
> Added new API to flush all SMQs related nix interface

nix-> NIX

Add more details in the commit log for meaning flush.

Change the subject to : common/cnxk: support SMQ flush

>
> Signed-off-by: Satha Rao 
> ---
>  drivers/common/cnxk/hw/nix.h |  6 +
>  drivers/common/cnxk/roc_nix.h|  1 +
>  drivers/common/cnxk/roc_nix_tm_ops.c | 50 
> 
>  drivers/common/cnxk/version.map  |  1 +
>  4 files changed, 58 insertions(+)
>
> diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h
> index a0ffd25..bc908c2 100644
> --- a/drivers/common/cnxk/hw/nix.h
> +++ b/drivers/common/cnxk/hw/nix.h
> @@ -2189,4 +2189,10 @@ struct nix_lso_format {
>  #define NIX_LSO_FORMAT_IDX_TSOV4 0
>  #define NIX_LSO_FORMAT_IDX_TSOV6 1
>
> +/* [CN10K, .) */
> +#define NIX_SENDSTATALG_MASK 0x7
> +#define NIX_SENDSTATALG_SEL_MASK  0x8
> +#define NIX_SENDSTAT_IOFFSET_MASK 0xFFF
> +#define NIX_SENDSTAT_OOFFSET_MASK 0xFFF
> +
>  #endif /* __NIX_HW_H__ */
> diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h
> index b0e6fab..ac7bd7e 100644
> --- a/drivers/common/cnxk/roc_nix.h
> +++ b/drivers/common/cnxk/roc_nix.h
> @@ -468,6 +468,7 @@ int __roc_api roc_nix_tm_rsrc_count(struct roc_nix 
> *roc_nix,
>  int __roc_api roc_nix_tm_node_name_get(struct roc_nix *roc_nix,
>uint32_t node_id, char *buf,
>size_t buflen);
> +int __roc_api roc_nix_smq_flush(struct roc_nix *roc_nix);
>
>  /* MAC */
>  int __roc_api roc_nix_mac_rxtx_start_stop(struct roc_nix *roc_nix, bool 
> start);
> diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c 
> b/drivers/common/cnxk/roc_nix_tm_ops.c
> index ed244d4..d9741f5 100644
> --- a/drivers/common/cnxk/roc_nix_tm_ops.c
> +++ b/drivers/common/cnxk/roc_nix_tm_ops.c
> @@ -311,6 +311,56 @@
>  }
>
>  int
> +roc_nix_smq_flush(struct roc_nix *roc_nix)
> +{
> +   struct nix *nix = roc_nix_to_nix_priv(roc_nix);
> +   struct nix_tm_node_list *list;
> +   enum roc_nix_tm_tree tree;
> +   struct nix_tm_node *node;
> +   int rc = 0;
> +
> +   if (!(nix->tm_flags & NIX_TM_HIERARCHY_ENA))
> +   return 0;
> +
> +   tree = nix->tm_tree;
> +   list = nix_tm_node_list(nix, tree);
> +
> +   /* XOFF & Flush all SMQ's. HRM mandates
> +* all SQ's empty before SMQ flush is issued.
> +*/
> +   TAILQ_FOREACH(node, list, node) {
> +   if (node->hw_lvl != NIX_TXSCH_LVL_SMQ)
> +   continue;
> +   if (!(node->flags & NIX_TM_NODE_HWRES))
> +   continue;
> +
> +   rc = nix_tm_smq_xoff(nix, node, true);
> +   if (rc) {
> +   plt_err("Failed to enable smq %u, rc=%d", node->hw_id,
> +   rc);
> +   goto exit;
> +   }
> +   }
> +
> +   /* XON all SMQ's */
> +   TAILQ_FOREACH(node, list, node) {
> +   if (node->hw_lvl != NIX_TXSCH_LVL_SMQ)
> +   continue;
> +   if (!(node->flags & NIX_TM_NODE_HWRES))
> +   continue;
> +
> +   rc = nix_tm_smq_xoff(nix, node, false);
> +   if (rc) {
> +   plt_err("Failed to enable smq %u, rc=%d", node->hw_id,
> +   rc);
> +   goto exit;
> +   }
> +   }
> +exit:
> +   return rc;
> +}
> +
> +int
>  roc_nix_tm_hierarchy_disable(struct roc_nix *roc_nix)
>  {
> struct nix *nix = roc_nix_to_nix_priv(roc_nix);
> diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
> index 5df2e56..388f938 100644
> --- a/drivers/common/cnxk/version.map
> +++ b/drivers/common/cnxk/version.map
> @@ -170,6 +170,7 @@ INTERNAL {
> roc_nix_xstats_names_get;
> roc_nix_switch_hdr_set;
> roc_nix_eeprom_info_get;
> +   roc_nix_smq_flush;
> roc_nix_tm_dump;
> roc_nix_tm_fini;
> roc_nix_tm_free_resources;
> --
> 1.8.3.1
>

Re: [dpdk-dev] [PATCH v2 5/8] common/cnxk: handler to get rte tm error type

2021-09-20 Thread Jerin Jacob

On Sat, Sep 18, 2021 at 8:02 PM  wrote:
>
> From: Satha Rao 
>
> Different TM handlers returns various platform specific errors,
> this patch introduces new API to convert these internal error
> types to RTE_TM* error types.
> Also updated error message API with missed TM error types.

Subject change suggestion:
common/cnxk: support TM error type get

>
> Signed-off-by: Satha Rao 
> ---
>  drivers/common/cnxk/cnxk_utils.c | 68 
> 
>  drivers/common/cnxk/cnxk_utils.h | 11 +++
>  drivers/common/cnxk/meson.build  |  5 +++
>  drivers/common/cnxk/roc_utils.c  |  6 
>  drivers/common/cnxk/version.map  |  1 +
>  5 files changed, 91 insertions(+)
>  create mode 100644 drivers/common/cnxk/cnxk_utils.c
>  create mode 100644 drivers/common/cnxk/cnxk_utils.h
>
> diff --git a/drivers/common/cnxk/cnxk_utils.c 
> b/drivers/common/cnxk/cnxk_utils.c
> new file mode 100644
> index 000..4e56adc
> --- /dev/null
> +++ b/drivers/common/cnxk/cnxk_utils.c
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#include 
> +#include 
> +
> +#include "roc_api.h"
> +#include "roc_priv.h"
> +
> +#include "cnxk_utils.h"
> +
> +int
> +roc_nix_tm_err_to_rte_err(int errorcode)
> +{
> +   int err_type;
> +
> +   switch (errorcode) {
> +   case NIX_ERR_TM_SHAPER_PKT_LEN_ADJUST:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN;
> +   break;
> +   case NIX_ERR_TM_INVALID_COMMIT_SZ:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE;
> +   break;
> +   case NIX_ERR_TM_INVALID_COMMIT_RATE:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE;
> +   break;
> +   case NIX_ERR_TM_INVALID_PEAK_SZ:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE;
> +   break;
> +   case NIX_ERR_TM_INVALID_PEAK_RATE:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE;
> +   break;
> +   case NIX_ERR_TM_INVALID_SHAPER_PROFILE:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID;
> +   break;
> +   case NIX_ERR_TM_SHAPER_PROFILE_IN_USE:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE;
> +   break;
> +   case NIX_ERR_TM_INVALID_NODE:
> +   err_type = RTE_TM_ERROR_TYPE_NODE_ID;
> +   break;
> +   case NIX_ERR_TM_PKT_MODE_MISMATCH:
> +   err_type = RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID;
> +   break;
> +   case NIX_ERR_TM_INVALID_PARENT:
> +   case NIX_ERR_TM_PARENT_PRIO_UPDATE:
> +   err_type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
> +   break;
> +   case NIX_ERR_TM_PRIO_ORDER:
> +   case NIX_ERR_TM_MULTIPLE_RR_GROUPS:
> +   err_type = RTE_TM_ERROR_TYPE_NODE_PRIORITY;
> +   break;
> +   case NIX_ERR_TM_PRIO_EXCEEDED:
> +   err_type = RTE_TM_ERROR_TYPE_CAPABILITIES;
> +   break;
> +   default:
> +   /**
> +* Handle general error (as defined in linux errno.h)
> +*/
> +   if (abs(errorcode) < 300)
> +   err_type = errorcode;
> +   else
> +   err_type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> +   break;
> +   }
> +
> +   return err_type;
> +}
> diff --git a/drivers/common/cnxk/cnxk_utils.h 
> b/drivers/common/cnxk/cnxk_utils.h
> new file mode 100644
> index 000..5463cd4
> --- /dev/null
> +++ b/drivers/common/cnxk/cnxk_utils.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#ifndef _CNXK_UTILS_H_
> +#define _CNXK_UTILS_H_
> +
> +#include "roc_platform.h"
> +
> +int __roc_api roc_nix_tm_err_to_rte_err(int errorcode);
> +
> +#endif /* _CNXK_UTILS_H_ */
> diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
> index 8a551d1..258429d 100644
> --- a/drivers/common/cnxk/meson.build
> +++ b/drivers/common/cnxk/meson.build
> @@ -61,5 +61,10 @@ sources = files(
>  # Security common code
>  sources += files('cnxk_security.c')
>
> +# common DPDK utilities code
> +sources += files('cnxk_utils.c')
> +
>  includes += include_directories('../../bus/pci')
>  includes += include_directories('../../../lib/net')
> +includes += include_directories('../../../lib/ethdev')
> +includes += include_directories('../../../lib/meter')
> diff --git a/drivers/common/cnxk/roc_utils.c b/drivers/common/cnxk/roc_utils.c
> index 9cb8708..751486f 100644
> --- a/drivers/common/cnxk/roc_utils.c
> +++ b/drivers/common/cnxk/roc_utils.c
> @@ -64,6 +64,9 @@
> case NIX_ERR_TM_INVALID_SHAPER_PROFILE:
> err_msg = "TM shaper profile invalid";
> break;
> +   case NIX_ERR_TM_PKT_MODE_MISMATCH:
> +   err_msg = "shaper profile pkt mode mismatch";
> +

Re: [dpdk-dev] [PATCH v2 7/8] net/cnxk: tm capabilities and queue rate limit handlers

2021-09-20 Thread Jerin Jacob

On Sat, Sep 18, 2021 at 8:03 PM  wrote:
>
> From: Satha Rao 
>
> Initial version of TM implementation added basic infrastructure,
> tm node_get, capabilities operations and rate limit queue operation.
>
> Signed-off-by: Satha Rao 


tm-> TM in subject.

# Could you rebase on top dpdk-next-net-mrvl.git it has following[1]
build issue to "common/cnxk: update ROC models" commit

# Please add Nithin's Acked-by in the next version.


[1]
FAILED: drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_nix_irq.c.o
ccache gcc -Idrivers/libtmp_rte_common_cnxk.a.p -Idrivers -I../drivers
-Idrivers/common/cnxk -I../drivers/common/cnxk -Idrivers/bus/pci
-I../drivers/bus/pci -Ilib/net -I../lib/net -Ilib/ethdev
-I../lib/ethdev -Ilib/meter -I../lib/meter -I.
-I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/linux/include -I../lib/eal/linux/include
-Ilib/eal/x86/include -I../lib/eal/x86/include -Ilib/eal/common
-I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I..
/lib/kvargs -Ilib/metrics -I../lib/metrics -Ilib/telemetry
-I../lib/telemetry -Ilib/pci -I../lib/pci -I../drivers/bus/pci/linux
-Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring
-I../lib/ring -Ilib/security -I../lib/securit
y -Ilib/cryptodev -I../lib/cryptodev -Ilib/rcu -I../lib/rcu
-fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated
-Wformat -Wformat-nonliteral -Wformat-se
curity -Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-i
nitializers -Wno-zero-length-bounds -D_GNU_SOURCE -fPIC -march=native
-DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation
-DRTE_LOG_DEFAULT_LOGTYPE=pmd.common.cnxk -MD -MQ
drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_
nix_irq.c.o -MF
drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_nix_irq.c.o.d -o
drivers/libtmp_rte_common_cnxk.a.p/common_cnxk_roc_nix_irq.c.o -c
../drivers/common/cnxk/roc_nix_irq.c
In file included from ../drivers/common/cnxk/roc_api.h:86,
 from ../drivers/common/cnxk/roc_nix_irq.c:5:
../drivers/common/cnxk/roc_model.h:120:1: error: redefinition of
‘roc_model_is_cn96_cx’
  120 | roc_model_is_cn96_cx(void)
  | ^~~~
../drivers/common/cnxk/roc_model.h:114:1: note: previous definition of
‘roc_model_is_cn96_cx’ with type ‘uint64_t(void)’ {aka ‘long unsigned
int(void)’}
  114 | roc_model_is_cn96_cx(void)
  | ^~~~

> ---
>  drivers/net/cnxk/cnxk_ethdev.c |   2 +
>  drivers/net/cnxk/cnxk_ethdev.h |   3 +
>  drivers/net/cnxk/cnxk_tm.c | 322 
> +
>  drivers/net/cnxk/cnxk_tm.h |  18 +++
>  drivers/net/cnxk/meson.build   |   1 +
>  5 files changed, 346 insertions(+)
>  create mode 100644 drivers/net/cnxk/cnxk_tm.c
>  create mode 100644 drivers/net/cnxk/cnxk_tm.h
>
> diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
> index 7152dcd..8629193 100644
> --- a/drivers/net/cnxk/cnxk_ethdev.c
> +++ b/drivers/net/cnxk/cnxk_ethdev.c
> @@ -1276,6 +1276,8 @@ struct eth_dev_ops cnxk_eth_dev_ops = {
> .rss_hash_update = cnxk_nix_rss_hash_update,
> .rss_hash_conf_get = cnxk_nix_rss_hash_conf_get,
> .set_mc_addr_list = cnxk_nix_mc_addr_list_configure,
> +   .set_queue_rate_limit = cnxk_nix_tm_set_queue_rate_limit,
> +   .tm_ops_get = cnxk_nix_tm_ops_get,
>  };
>
>  static int
> diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
> index 27920c8..10e05e6 100644
> --- a/drivers/net/cnxk/cnxk_ethdev.h
> +++ b/drivers/net/cnxk/cnxk_ethdev.h
> @@ -330,6 +330,9 @@ int cnxk_nix_timesync_write_time(struct rte_eth_dev 
> *eth_dev,
>  int cnxk_nix_read_clock(struct rte_eth_dev *eth_dev, uint64_t *clock);
>
>  uint64_t cnxk_nix_rxq_mbuf_setup(struct cnxk_eth_dev *dev);
> +int cnxk_nix_tm_ops_get(struct rte_eth_dev *eth_dev, void *ops);
> +int cnxk_nix_tm_set_queue_rate_limit(struct rte_eth_dev *eth_dev,
> +uint16_t queue_idx, uint16_t tx_rate);
>
>  /* RSS */
>  uint32_t cnxk_rss_ethdev_to_nix(struct cnxk_eth_dev *dev, uint64_t 
> ethdev_rss,
> diff --git a/drivers/net/cnxk/cnxk_tm.c b/drivers/net/cnxk/cnxk_tm.c
> new file mode 100644
> index 000..87fd8be
> --- /dev/null
> +++ b/drivers/net/cnxk/cnxk_tm.c
> @@ -0,0 +1,322 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#include 
> +#include 
> +#include 
> +
> +static int
> +cnxk_nix_tm_node_type_get(struct rte_eth_dev *eth_dev, uint32_t node_id,
> + int *is_leaf, struct rte_tm_error *error)
> +{
> +   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
> +   struct roc_nix *nix = &dev->nix;
> +   struct roc_nix_tm_node *node;
> +
> +   if (is_leaf == NULL) {
> +

95 matches

Mail list logo