[dpdk-dev] [PATCH] maintainers: remove maintainer for hns3

2021-01-16 Thread Lijun Ou
Because Wei Hu has changed to a new job and the
email address(xavier.hu...@huawei.com) has
expired, we remove him from the hns3 maintainer
list.

All patches signed-off-by Wei Hu will be copied
to Lijun Ou.

Signed-off-by: Lijun Ou 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 76ed473..7a16af3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -648,7 +648,6 @@ F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
 Hisilicon hns3
-M: Wei Hu (Xavier) 
 M: Min Hu (Connor) 
 M: Yisen Zhuang 
 M: Lijun Ou 
-- 
2.7.4



[dpdk-dev] [PATCH] app/testpmd: tx pkt clones parameter in flowgen

2021-01-16 Thread Igor Russkikh
When testing high performance numbers, it is often that CPU performance
limits the max values device can reach (both in pps and in gbps)

Here instead of recreating each packet separately, we use clones counter
to resend the same mbuf to the line multiple times.

PMDs handle that transparently due to reference counting inside of mbuf.

Reaching max PPS on small packet sizes helps here:
Some data from our 2 port x 50G device. Using 2*6 tx queues, 64b packets,
PowerEdge R7525, AMD EPYC 7452:

./build/app/dpdk-testpmd -l 32-63  -- --forward-mode=flowgen \
  --rxq=6 --txq=6  --disable-crc-strip --burst=512 \
  --flowgen-clones=0 --txd=4096 --stats-period=1 --txpkts=64

Gives ~46MPPS TX output:

  Tx-pps: 22926849  Tx-bps:  11738590176
  Tx-pps: 23642629  Tx-bps:  12105024112

Setting flowgen-clones to 512 pushes TX almost to our device
physical limit (68MPPS) using same 2*6 queues(cores):

  Tx-pps: 34357556  Tx-bps:  17591073696
  Tx-pps: 34353211  Tx-bps:  17588802640

Doing similar measurements per core, I see one core can do
6.9MPPS (without clones) vs 11MPPS (with clones)

Verified on Marvell qede and atlantic PMDs.

this v1:
  - fixes on Ferruh's comments

rfc v2: http://patchwork.dpdk.org/patch/78800/
  - increment ref counter for each mbuf pointer copy
rfc v1: http://patchwork.dpdk.org/patch/78674/

Signed-off-by: Igor Russkikh 
---
 app/test-pmd/flowgen.c| 105 ++
 app/test-pmd/parameters.c |  10 +++
 app/test-pmd/testpmd.c|   1 +
 app/test-pmd/testpmd.h|   1 +
 doc/guides/testpmd_app_ug/run_app.rst |   7 ++
 5 files changed, 77 insertions(+), 47 deletions(-)

diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index acf3e2460..53a2e5a63 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -94,6 +94,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
uint16_t nb_rx;
uint16_t nb_tx;
uint16_t nb_pkt;
+   uint16_t nb_clones = nb_pkt_flowgen_clones;
uint16_t i;
uint32_t retry;
uint64_t tx_offloads;
@@ -123,53 +124,63 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
ol_flags |= PKT_TX_MACSEC;
 
for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
-   pkt = rte_mbuf_raw_alloc(mbp);
-   if (!pkt)
-   break;
-
-   pkt->data_len = pkt_size;
-   pkt->next = NULL;
-
-   /* Initialize Ethernet header. */
-   eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *);
-   rte_ether_addr_copy(&cfg_ether_dst, ð_hdr->d_addr);
-   rte_ether_addr_copy(&cfg_ether_src, ð_hdr->s_addr);
-   eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
-
-   /* Initialize IP header. */
-   ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1);
-   memset(ip_hdr, 0, sizeof(*ip_hdr));
-   ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
-   ip_hdr->type_of_service = 0;
-   ip_hdr->fragment_offset = 0;
-   ip_hdr->time_to_live= IP_DEFTTL;
-   ip_hdr->next_proto_id   = IPPROTO_UDP;
-   ip_hdr->packet_id   = 0;
-   ip_hdr->src_addr= rte_cpu_to_be_32(cfg_ip_src);
-   ip_hdr->dst_addr= rte_cpu_to_be_32(cfg_ip_dst +
-  next_flow);
-   ip_hdr->total_length= RTE_CPU_TO_BE_16(pkt_size -
-  sizeof(*eth_hdr));
-   ip_hdr->hdr_checksum= ip_sum((unaligned_uint16_t *)ip_hdr,
-sizeof(*ip_hdr));
-
-   /* Initialize UDP header. */
-   udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1);
-   udp_hdr->src_port   = rte_cpu_to_be_16(cfg_udp_src);
-   udp_hdr->dst_port   = rte_cpu_to_be_16(cfg_udp_dst);
-   udp_hdr->dgram_cksum= 0; /* No UDP checksum. */
-   udp_hdr->dgram_len  = RTE_CPU_TO_BE_16(pkt_size -
-  sizeof(*eth_hdr) -
-  sizeof(*ip_hdr));
-   pkt->nb_segs= 1;
-   pkt->pkt_len= pkt_size;
-   pkt->ol_flags   &= EXT_ATTACHED_MBUF;
-   pkt->ol_flags   |= ol_flags;
-   pkt->vlan_tci   = vlan_tci;
-   pkt->vlan_tci_outer = vlan_tci_outer;
-   pkt->l2_len = sizeof(struct rte_ether_hdr);
-   pkt->l3_len = sizeof(struct rte_ipv4_hdr);
-   pkts_burst[nb_pkt]  = pkt;
+   if (!nb_pkt || !nb_clones) {
+   nb_clones = nb_pkt_flowgen_clones;
+   /* Logic l

Re: [dpdk-dev] [PATCH v2 04/19] net: fix missing header include

2021-01-16 Thread Ophir Munk
Acked-by: Ophir Munk 

> -Original Message-
> From: Bruce Richardson 
> Sent: Friday, January 15, 2021 1:11 PM
> To: dev@dpdk.org
> Cc: david.march...@redhat.com; Bruce Richardson
> ; sta...@dpdk.org; Olivier Matz
> ; Ophir Munk ; Ferruh
> Yigit 
> Subject: [PATCH v2 04/19] net: fix missing header include
> 
> The Geneve protocol header file is missing the rte_byteorder.h header.
> 
> Fixes: ea0e711b8ae0 ("app/testpmd: add GENEVE parsing")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Bruce Richardson 
> ---
>  lib/librte_net/rte_geneve.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_net/rte_geneve.h b/lib/librte_net/rte_geneve.h index
> bb67724c31..7c3d477dcb 100644
> --- a/lib/librte_net/rte_geneve.h
> +++ b/lib/librte_net/rte_geneve.h
> @@ -11,6 +11,7 @@
>   * GENEVE-related definitions
>   */
>  #include 
> +#include 
> 
>  #ifdef __cplusplus
>  extern "C" {
> --
> 2.27.0



Re: [dpdk-dev] [PATCH v15 05/12] build: organize Arm config into dict

2021-01-16 Thread Ruifeng Wang

> -Original Message-
> From: Juraj Linkeš 
> Sent: Friday, January 15, 2021 9:26 PM
> To: bruce.richard...@intel.com; Ruifeng Wang ;
> Honnappa Nagarahalli ; Phil Yang
> ; vcchu...@amazon.com; Dharmik Thakkar
> ; jerinjac...@gmail.com;
> hemant.agra...@nxp.com; Ajit Khaparde (ajit.khapa...@broadcom.com)
> ; ferruh.yi...@intel.com;
> abo...@pensando.io
> Cc: dev@dpdk.org; Juraj Linkeš 
> Subject: [PATCH v15 05/12] build: organize Arm config into dict
> 
> Use dictionary lookup instead of checking for existing variables, iterating 
> over
> all elements in the list or checking lists for optional configuration. Move
> variable contents into the dictionary for variables that would be referenced
> only once.
> Fallback to generic part number if the discovered part number is unknown.
> 
> Signed-off-by: Juraj Linkeš 
> Reviewed-by: Honnappa Nagarahalli 
> ---
>  config/arm/meson.build | 311 -
>  1 file changed, 183 insertions(+), 128 deletions(-)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build index
> 7a74938bd..39cf98c67 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -27,124 +27,172 @@ flags_common = [
>   ['RTE_CACHE_LINE_SIZE', 128]
>  ]
> 
> -# implementer specific aarch64 flags, with middle priority -# (will overwrite
> common flags) -flags_implementer_generic = [
> - ['RTE_MACHINE', '"armv8a"'],
> - ['RTE_USE_C11_MEM_MODEL', true],
> - ['RTE_MAX_LCORE', 256]
> -]
> -flags_implementer_arm = [
> - ['RTE_MACHINE', '"armv8a"'],
> - ['RTE_USE_C11_MEM_MODEL', true],
> - ['RTE_CACHE_LINE_SIZE', 64],
> - ['RTE_MAX_LCORE', 16]
> -]
> -flags_implementer_cavium = [
> - ['RTE_MAX_VFIO_GROUPS', 128],
> - ['RTE_MAX_LCORE', 96],
> - ['RTE_MAX_NUMA_NODES', 2]
> -]
> -flags_implementer_dpaa = [
> - ['RTE_MACHINE', '"dpaa"'],
> - ['RTE_LIBRTE_DPAA2_USE_PHYS_IOVA', false],
> - ['RTE_USE_C11_MEM_MODEL', true],
> - ['RTE_CACHE_LINE_SIZE', 64],
> - ['RTE_MAX_LCORE', 16],
> - ['RTE_MAX_NUMA_NODES', 1]
> -]
> -flags_implementer_emag = [
> - ['RTE_MACHINE', '"emag"'],
> - ['RTE_CACHE_LINE_SIZE', 64],
> - ['RTE_MAX_LCORE', 32],
> - ['RTE_MAX_NUMA_NODES', 1]
> -]
> -flags_implementer_armada = [
> - ['RTE_MACHINE', '"armv8a"'],
> - ['RTE_CACHE_LINE_SIZE', 64],
> - ['RTE_MAX_LCORE', 16],
> - ['RTE_MAX_NUMA_NODES', 1]
> -]
> +## Part numbers are specific to Arm implementers # implementer specific
> +aarch64 flags have middle priority
> +# (will overwrite common flags)
> +# part number specific aarch64 flags have the highest priority
> +# (will overwrite both common and implementer specific flags)
> +implementer_generic = {
> + 'description': 'Generic armv8',
> + 'flags': [
> + ['RTE_MACHINE', '"armv8a"'],
> + ['RTE_USE_C11_MEM_MODEL', true],
> + ['RTE_MAX_LCORE', 256]
> + ],
> + 'part_number_config': {
> + 'generic': {'machine_args': ['-march=armv8-a+crc',
> +  '-moutline-atomics']}
> + }
> +}
> +
> +part_number_config_arm = {
> + 'generic': {'machine_args':  ['-march=armv8-a+crc',
> +   '-moutline-atomics']},
> + 'native': {'machine_args':  ['-march=native']},
> + '0xd03': {'machine_args':  ['-mcpu=cortex-a53']},
> + '0xd04': {'machine_args':  ['-mcpu=cortex-a35']},
> + '0xd07': {'machine_args':  ['-mcpu=cortex-a57']},
> + '0xd08': {'machine_args':  ['-mcpu=cortex-a72']},
> + '0xd09': {'machine_args':  ['-mcpu=cortex-a73']},
> + '0xd0a': {'machine_args':  ['-mcpu=cortex-a75']},
> + '0xd0b': {'machine_args':  ['-mcpu=cortex-a76']},
> + '0xd0c': {
> + 'machine_args':  ['-march=armv8.2-a+crypto',
> +   '-mcpu=neoverse-n1'],
> + 'flags': [
> + ['RTE_MACHINE', '"neoverse-n1"'],
> + ['RTE_ARM_FEATURE_ATOMICS', true],
> + ['RTE_EAL_NUMA_AWARE_HUGEPAGES', false],
> + ['RTE_LIBRTE_VHOST_NUMA', false],
> + ['RTE_MAX_MEM_MB', 1048576],
> + ['RTE_MAX_LCORE', 80],
> + ['RTE_MAX_NUMA_NODES', 1]
> + ]
> + },
> + '0xd49': {
> + 'machine_args':  ['-march=armv8.5-a+crypto+sve2'],
> + 'flags': [
> + ['RTE_MACHINE', '"neoverse-n2"'],
> + ['RTE_ARM_FEATURE_ATOMICS', true],
> + ['RTE_EAL_NUMA_AWARE_HUGEPAGES', false],
> + ['RTE_LIBRTE_VHOST_NUMA', false],
> + ['RTE_MAX_LCORE', 64]
> + ]
> + }
> +}
> +implementer_arm = {
> + 'description': 'Arm',
> + 'flags': [
> + ['RTE_MACHINE', '"armv8a"'],
> + ['RTE_USE_C11_MEM_MODEL', true],
> + ['RTE_CACHE_LINE_SIZE', 64],
> + ['RTE_MAX_LCORE'

Re: [dpdk-dev] [PATCH 1/3] net/hinic: restore vectorised code

2021-01-16 Thread Ruifeng Wang


> -Original Message-
> From: David Marchand 
> Sent: Friday, January 15, 2021 9:40 PM
> To: dev@dpdk.org
> Cc: ferruh.yi...@intel.com; sta...@dpdk.org; Ziyang Xuan
> ; Xiaoyun Wang
> ; Guoyang Zhou
> ; Ciara Power ;
> Ruifeng Wang ; tho...@monjalon.net
> Subject: [PATCH 1/3] net/hinic: restore vectorised code
> 
> Following make support removal, the vectorised code is not built anymore,
> fix the build flag check.
> 
> Fixes: 3cc6ecfdfe85 ("build: remove makefiles")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/hinic/hinic_pmd_rx.c |  6 +++---
> drivers/net/hinic/hinic_pmd_tx.c | 10 +-
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/hinic/hinic_pmd_rx.c
> b/drivers/net/hinic/hinic_pmd_rx.c
> index a49769a863..842399cc4c 100644
> --- a/drivers/net/hinic/hinic_pmd_rx.c
> +++ b/drivers/net/hinic/hinic_pmd_rx.c
> @@ -4,7 +4,7 @@
> 
>  #include 
>  #include 
> -#ifdef __ARM64_NEON__
> +#ifdef RTE_ARCH_ARM64

We can test '__ARM_NEON' which will be defined by compilers.
https://developer.arm.com/documentation/ihi0053/latest/

>  #include 
>  #endif
> 
> @@ -762,7 +762,7 @@ void hinic_free_all_rx_mbufs(struct hinic_rxq *rxq)
> static inline void hinic_rq_cqe_be_to_cpu32(void *dst_le32,
>   volatile void *src_be32)
>  {
> -#if defined(__X86_64_SSE__)
> +#if defined(RTE_ARCH_X86_64)
>   volatile __m128i *wqe_be = (volatile __m128i *)src_be32;
>   __m128i *wqe_le = (__m128i *)dst_le32;
>   __m128i shuf_mask =  _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ -
> 770,7 +770,7 @@ static inline void hinic_rq_cqe_be_to_cpu32(void *dst_le32,
> 
>   /* l2nic just use first 128 bits */
>   wqe_le[0] = _mm_shuffle_epi8(wqe_be[0], shuf_mask); -#elif
> defined(__ARM64_NEON__)
> +#elif defined(RTE_ARCH_ARM64)
>   volatile uint8x16_t *wqe_be = (volatile uint8x16_t *)src_be32;
>   uint8x16_t *wqe_le = (uint8x16_t *)dst_le32;
>   const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, diff --git
> a/drivers/net/hinic/hinic_pmd_tx.c b/drivers/net/hinic/hinic_pmd_tx.c
> index 9d0264e67a..669f82389c 100644
> --- a/drivers/net/hinic/hinic_pmd_tx.c
> +++ b/drivers/net/hinic/hinic_pmd_tx.c
> @@ -7,7 +7,7 @@
>  #include 
>  #include 
>  #include 
> -#ifdef __ARM64_NEON__
> +#ifdef RTE_ARCH_ARM64
>  #include 
>  #endif
> 
> @@ -203,7 +203,7 @@
> 
>  static inline void hinic_sq_wqe_cpu_to_be32(void *data, int nr_wqebb)  { -
> #if defined(__X86_64_SSE__)
> +#if defined(RTE_ARCH_X86_64)
>   int i;
>   __m128i *wqe_line = (__m128i *)data;
>   __m128i shuf_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ -
> 217,7 +217,7 @@ static inline void hinic_sq_wqe_cpu_to_be32(void *data,
> int nr_wqebb)
>   wqe_line[3] = _mm_shuffle_epi8(wqe_line[3], shuf_mask);
>   wqe_line += 4;
>   }
> -#elif defined(__ARM64_NEON__)
> +#elif defined(RTE_ARCH_ARM64)
>   int i;
>   uint8x16_t *wqe_line = (uint8x16_t *)data;
>   const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, @@ -237,7
> +237,7 @@ static inline void hinic_sq_wqe_cpu_to_be32(void *data, int
> nr_wqebb)
> 
>  static inline void hinic_sge_cpu_to_be32(void *data, int nr_sge)  { -#if
> defined(__X86_64_SSE__)
> +#if defined(RTE_ARCH_X86_64)
>   int i;
>   __m128i *sge_line = (__m128i *)data;
>   __m128i shuf_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ -
> 248,7 +248,7 @@ static inline void hinic_sge_cpu_to_be32(void *data, int
> nr_sge)
>   *sge_line = _mm_shuffle_epi8(*sge_line, shuf_mask);
>   sge_line++;
>   }
> -#elif defined(__ARM64_NEON__)
> +#elif defined(RTE_ARCH_ARM64)
>   int i;
>   uint8x16_t *sge_line = (uint8x16_t *)data;
>   const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10,
> --
> 2.23.0



Re: [dpdk-dev] [EXT] Re: input port in mbuf

2021-01-16 Thread Liron Himi
Hi,

Sorry for rearising this issue again, please check my comments inline

Liron Himi

-Original Message-
From: Stephen Hemminger  
Sent: Wednesday, 6 May 2020 23:24
To: Liron Himi 
Cc: dpdk-dev 
Subject: Re: [EXT] Re: [dpdk-dev] input port in mbuf

On Wed, 6 May 2020 20:17:20 +
Liron Himi  wrote:

> For performance optimizations, we need to know the input DPDK port as after 
> the buffer was transmitted via our ethdev driver instead of release it back 
> to the memory-pool we can release it to the originated HW pool of the input 
> port. 

But you can't be sure where the mbuf came from.
It could be a receive on any vendors driver, or it could be from a private pool 
that is  used for transmit, or anywhere.

[L.H.] I'm only referring to PP2->PP2 flow on an Armada platform. For any other 
flow the transmitted buffer will be returned to its 'mb'.

Please reconsider the real nature here; the world is not testpmd, l2fwd, l3fwd 
etc.
These are the kind of optimizations that break real applications and cause more 
trouble than the benefit for one silly benchmark.

[L.H.] I don't want to influence application usage, this is why I asked if 
there is a location in the mbuf where a driver can put its own info. Like the 
private area for the application, but just for the input driver.
And if there is no such location right now, will it be acceptable to introduce 
such one?