[dpdk-dev] [PATCH] maintainers: remove maintainer for hns3
Because Wei Hu has changed to a new job and the email address(xavier.hu...@huawei.com) has expired, we remove him from the hns3 maintainer list. All patches signed-off-by Wei Hu will be copied to Lijun Ou. Signed-off-by: Lijun Ou --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 76ed473..7a16af3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -648,7 +648,6 @@ F: doc/guides/nics/enic.rst F: doc/guides/nics/features/enic.ini Hisilicon hns3 -M: Wei Hu (Xavier) M: Min Hu (Connor) M: Yisen Zhuang M: Lijun Ou -- 2.7.4
[dpdk-dev] [PATCH] app/testpmd: tx pkt clones parameter in flowgen
When testing high performance numbers, it is often that CPU performance limits the max values device can reach (both in pps and in gbps) Here instead of recreating each packet separately, we use clones counter to resend the same mbuf to the line multiple times. PMDs handle that transparently due to reference counting inside of mbuf. Reaching max PPS on small packet sizes helps here: Some data from our 2 port x 50G device. Using 2*6 tx queues, 64b packets, PowerEdge R7525, AMD EPYC 7452: ./build/app/dpdk-testpmd -l 32-63 -- --forward-mode=flowgen \ --rxq=6 --txq=6 --disable-crc-strip --burst=512 \ --flowgen-clones=0 --txd=4096 --stats-period=1 --txpkts=64 Gives ~46MPPS TX output: Tx-pps: 22926849 Tx-bps: 11738590176 Tx-pps: 23642629 Tx-bps: 12105024112 Setting flowgen-clones to 512 pushes TX almost to our device physical limit (68MPPS) using same 2*6 queues(cores): Tx-pps: 34357556 Tx-bps: 17591073696 Tx-pps: 34353211 Tx-bps: 17588802640 Doing similar measurements per core, I see one core can do 6.9MPPS (without clones) vs 11MPPS (with clones) Verified on Marvell qede and atlantic PMDs. this v1: - fixes on Ferruh's comments rfc v2: http://patchwork.dpdk.org/patch/78800/ - increment ref counter for each mbuf pointer copy rfc v1: http://patchwork.dpdk.org/patch/78674/ Signed-off-by: Igor Russkikh --- app/test-pmd/flowgen.c| 105 ++ app/test-pmd/parameters.c | 10 +++ app/test-pmd/testpmd.c| 1 + app/test-pmd/testpmd.h| 1 + doc/guides/testpmd_app_ug/run_app.rst | 7 ++ 5 files changed, 77 insertions(+), 47 deletions(-) diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c index acf3e2460..53a2e5a63 100644 --- a/app/test-pmd/flowgen.c +++ b/app/test-pmd/flowgen.c @@ -94,6 +94,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs) uint16_t nb_rx; uint16_t nb_tx; uint16_t nb_pkt; + uint16_t nb_clones = nb_pkt_flowgen_clones; uint16_t i; uint32_t retry; uint64_t tx_offloads; @@ -123,53 +124,63 @@ pkt_burst_flow_gen(struct fwd_stream *fs) ol_flags |= PKT_TX_MACSEC; for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) { - pkt = rte_mbuf_raw_alloc(mbp); - if (!pkt) - break; - - pkt->data_len = pkt_size; - pkt->next = NULL; - - /* Initialize Ethernet header. */ - eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); - rte_ether_addr_copy(&cfg_ether_dst, ð_hdr->d_addr); - rte_ether_addr_copy(&cfg_ether_src, ð_hdr->s_addr); - eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4); - - /* Initialize IP header. */ - ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); - memset(ip_hdr, 0, sizeof(*ip_hdr)); - ip_hdr->version_ihl = RTE_IPV4_VHL_DEF; - ip_hdr->type_of_service = 0; - ip_hdr->fragment_offset = 0; - ip_hdr->time_to_live= IP_DEFTTL; - ip_hdr->next_proto_id = IPPROTO_UDP; - ip_hdr->packet_id = 0; - ip_hdr->src_addr= rte_cpu_to_be_32(cfg_ip_src); - ip_hdr->dst_addr= rte_cpu_to_be_32(cfg_ip_dst + - next_flow); - ip_hdr->total_length= RTE_CPU_TO_BE_16(pkt_size - - sizeof(*eth_hdr)); - ip_hdr->hdr_checksum= ip_sum((unaligned_uint16_t *)ip_hdr, -sizeof(*ip_hdr)); - - /* Initialize UDP header. */ - udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1); - udp_hdr->src_port = rte_cpu_to_be_16(cfg_udp_src); - udp_hdr->dst_port = rte_cpu_to_be_16(cfg_udp_dst); - udp_hdr->dgram_cksum= 0; /* No UDP checksum. */ - udp_hdr->dgram_len = RTE_CPU_TO_BE_16(pkt_size - - sizeof(*eth_hdr) - - sizeof(*ip_hdr)); - pkt->nb_segs= 1; - pkt->pkt_len= pkt_size; - pkt->ol_flags &= EXT_ATTACHED_MBUF; - pkt->ol_flags |= ol_flags; - pkt->vlan_tci = vlan_tci; - pkt->vlan_tci_outer = vlan_tci_outer; - pkt->l2_len = sizeof(struct rte_ether_hdr); - pkt->l3_len = sizeof(struct rte_ipv4_hdr); - pkts_burst[nb_pkt] = pkt; + if (!nb_pkt || !nb_clones) { + nb_clones = nb_pkt_flowgen_clones; + /* Logic l
Re: [dpdk-dev] [PATCH v2 04/19] net: fix missing header include
Acked-by: Ophir Munk > -Original Message- > From: Bruce Richardson > Sent: Friday, January 15, 2021 1:11 PM > To: dev@dpdk.org > Cc: david.march...@redhat.com; Bruce Richardson > ; sta...@dpdk.org; Olivier Matz > ; Ophir Munk ; Ferruh > Yigit > Subject: [PATCH v2 04/19] net: fix missing header include > > The Geneve protocol header file is missing the rte_byteorder.h header. > > Fixes: ea0e711b8ae0 ("app/testpmd: add GENEVE parsing") > Cc: sta...@dpdk.org > > Signed-off-by: Bruce Richardson > --- > lib/librte_net/rte_geneve.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_net/rte_geneve.h b/lib/librte_net/rte_geneve.h index > bb67724c31..7c3d477dcb 100644 > --- a/lib/librte_net/rte_geneve.h > +++ b/lib/librte_net/rte_geneve.h > @@ -11,6 +11,7 @@ > * GENEVE-related definitions > */ > #include > +#include > > #ifdef __cplusplus > extern "C" { > -- > 2.27.0
Re: [dpdk-dev] [PATCH v15 05/12] build: organize Arm config into dict
> -Original Message- > From: Juraj Linkeš > Sent: Friday, January 15, 2021 9:26 PM > To: bruce.richard...@intel.com; Ruifeng Wang ; > Honnappa Nagarahalli ; Phil Yang > ; vcchu...@amazon.com; Dharmik Thakkar > ; jerinjac...@gmail.com; > hemant.agra...@nxp.com; Ajit Khaparde (ajit.khapa...@broadcom.com) > ; ferruh.yi...@intel.com; > abo...@pensando.io > Cc: dev@dpdk.org; Juraj Linkeš > Subject: [PATCH v15 05/12] build: organize Arm config into dict > > Use dictionary lookup instead of checking for existing variables, iterating > over > all elements in the list or checking lists for optional configuration. Move > variable contents into the dictionary for variables that would be referenced > only once. > Fallback to generic part number if the discovered part number is unknown. > > Signed-off-by: Juraj Linkeš > Reviewed-by: Honnappa Nagarahalli > --- > config/arm/meson.build | 311 - > 1 file changed, 183 insertions(+), 128 deletions(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build index > 7a74938bd..39cf98c67 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -27,124 +27,172 @@ flags_common = [ > ['RTE_CACHE_LINE_SIZE', 128] > ] > > -# implementer specific aarch64 flags, with middle priority -# (will overwrite > common flags) -flags_implementer_generic = [ > - ['RTE_MACHINE', '"armv8a"'], > - ['RTE_USE_C11_MEM_MODEL', true], > - ['RTE_MAX_LCORE', 256] > -] > -flags_implementer_arm = [ > - ['RTE_MACHINE', '"armv8a"'], > - ['RTE_USE_C11_MEM_MODEL', true], > - ['RTE_CACHE_LINE_SIZE', 64], > - ['RTE_MAX_LCORE', 16] > -] > -flags_implementer_cavium = [ > - ['RTE_MAX_VFIO_GROUPS', 128], > - ['RTE_MAX_LCORE', 96], > - ['RTE_MAX_NUMA_NODES', 2] > -] > -flags_implementer_dpaa = [ > - ['RTE_MACHINE', '"dpaa"'], > - ['RTE_LIBRTE_DPAA2_USE_PHYS_IOVA', false], > - ['RTE_USE_C11_MEM_MODEL', true], > - ['RTE_CACHE_LINE_SIZE', 64], > - ['RTE_MAX_LCORE', 16], > - ['RTE_MAX_NUMA_NODES', 1] > -] > -flags_implementer_emag = [ > - ['RTE_MACHINE', '"emag"'], > - ['RTE_CACHE_LINE_SIZE', 64], > - ['RTE_MAX_LCORE', 32], > - ['RTE_MAX_NUMA_NODES', 1] > -] > -flags_implementer_armada = [ > - ['RTE_MACHINE', '"armv8a"'], > - ['RTE_CACHE_LINE_SIZE', 64], > - ['RTE_MAX_LCORE', 16], > - ['RTE_MAX_NUMA_NODES', 1] > -] > +## Part numbers are specific to Arm implementers # implementer specific > +aarch64 flags have middle priority > +# (will overwrite common flags) > +# part number specific aarch64 flags have the highest priority > +# (will overwrite both common and implementer specific flags) > +implementer_generic = { > + 'description': 'Generic armv8', > + 'flags': [ > + ['RTE_MACHINE', '"armv8a"'], > + ['RTE_USE_C11_MEM_MODEL', true], > + ['RTE_MAX_LCORE', 256] > + ], > + 'part_number_config': { > + 'generic': {'machine_args': ['-march=armv8-a+crc', > + '-moutline-atomics']} > + } > +} > + > +part_number_config_arm = { > + 'generic': {'machine_args': ['-march=armv8-a+crc', > + '-moutline-atomics']}, > + 'native': {'machine_args': ['-march=native']}, > + '0xd03': {'machine_args': ['-mcpu=cortex-a53']}, > + '0xd04': {'machine_args': ['-mcpu=cortex-a35']}, > + '0xd07': {'machine_args': ['-mcpu=cortex-a57']}, > + '0xd08': {'machine_args': ['-mcpu=cortex-a72']}, > + '0xd09': {'machine_args': ['-mcpu=cortex-a73']}, > + '0xd0a': {'machine_args': ['-mcpu=cortex-a75']}, > + '0xd0b': {'machine_args': ['-mcpu=cortex-a76']}, > + '0xd0c': { > + 'machine_args': ['-march=armv8.2-a+crypto', > + '-mcpu=neoverse-n1'], > + 'flags': [ > + ['RTE_MACHINE', '"neoverse-n1"'], > + ['RTE_ARM_FEATURE_ATOMICS', true], > + ['RTE_EAL_NUMA_AWARE_HUGEPAGES', false], > + ['RTE_LIBRTE_VHOST_NUMA', false], > + ['RTE_MAX_MEM_MB', 1048576], > + ['RTE_MAX_LCORE', 80], > + ['RTE_MAX_NUMA_NODES', 1] > + ] > + }, > + '0xd49': { > + 'machine_args': ['-march=armv8.5-a+crypto+sve2'], > + 'flags': [ > + ['RTE_MACHINE', '"neoverse-n2"'], > + ['RTE_ARM_FEATURE_ATOMICS', true], > + ['RTE_EAL_NUMA_AWARE_HUGEPAGES', false], > + ['RTE_LIBRTE_VHOST_NUMA', false], > + ['RTE_MAX_LCORE', 64] > + ] > + } > +} > +implementer_arm = { > + 'description': 'Arm', > + 'flags': [ > + ['RTE_MACHINE', '"armv8a"'], > + ['RTE_USE_C11_MEM_MODEL', true], > + ['RTE_CACHE_LINE_SIZE', 64], > + ['RTE_MAX_LCORE'
Re: [dpdk-dev] [PATCH 1/3] net/hinic: restore vectorised code
> -Original Message- > From: David Marchand > Sent: Friday, January 15, 2021 9:40 PM > To: dev@dpdk.org > Cc: ferruh.yi...@intel.com; sta...@dpdk.org; Ziyang Xuan > ; Xiaoyun Wang > ; Guoyang Zhou > ; Ciara Power ; > Ruifeng Wang ; tho...@monjalon.net > Subject: [PATCH 1/3] net/hinic: restore vectorised code > > Following make support removal, the vectorised code is not built anymore, > fix the build flag check. > > Fixes: 3cc6ecfdfe85 ("build: remove makefiles") > Cc: sta...@dpdk.org > > Signed-off-by: David Marchand > --- > drivers/net/hinic/hinic_pmd_rx.c | 6 +++--- > drivers/net/hinic/hinic_pmd_tx.c | 10 +- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/hinic/hinic_pmd_rx.c > b/drivers/net/hinic/hinic_pmd_rx.c > index a49769a863..842399cc4c 100644 > --- a/drivers/net/hinic/hinic_pmd_rx.c > +++ b/drivers/net/hinic/hinic_pmd_rx.c > @@ -4,7 +4,7 @@ > > #include > #include > -#ifdef __ARM64_NEON__ > +#ifdef RTE_ARCH_ARM64 We can test '__ARM_NEON' which will be defined by compilers. https://developer.arm.com/documentation/ihi0053/latest/ > #include > #endif > > @@ -762,7 +762,7 @@ void hinic_free_all_rx_mbufs(struct hinic_rxq *rxq) > static inline void hinic_rq_cqe_be_to_cpu32(void *dst_le32, > volatile void *src_be32) > { > -#if defined(__X86_64_SSE__) > +#if defined(RTE_ARCH_X86_64) > volatile __m128i *wqe_be = (volatile __m128i *)src_be32; > __m128i *wqe_le = (__m128i *)dst_le32; > __m128i shuf_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ - > 770,7 +770,7 @@ static inline void hinic_rq_cqe_be_to_cpu32(void *dst_le32, > > /* l2nic just use first 128 bits */ > wqe_le[0] = _mm_shuffle_epi8(wqe_be[0], shuf_mask); -#elif > defined(__ARM64_NEON__) > +#elif defined(RTE_ARCH_ARM64) > volatile uint8x16_t *wqe_be = (volatile uint8x16_t *)src_be32; > uint8x16_t *wqe_le = (uint8x16_t *)dst_le32; > const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, diff --git > a/drivers/net/hinic/hinic_pmd_tx.c b/drivers/net/hinic/hinic_pmd_tx.c > index 9d0264e67a..669f82389c 100644 > --- a/drivers/net/hinic/hinic_pmd_tx.c > +++ b/drivers/net/hinic/hinic_pmd_tx.c > @@ -7,7 +7,7 @@ > #include > #include > #include > -#ifdef __ARM64_NEON__ > +#ifdef RTE_ARCH_ARM64 > #include > #endif > > @@ -203,7 +203,7 @@ > > static inline void hinic_sq_wqe_cpu_to_be32(void *data, int nr_wqebb) { - > #if defined(__X86_64_SSE__) > +#if defined(RTE_ARCH_X86_64) > int i; > __m128i *wqe_line = (__m128i *)data; > __m128i shuf_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ - > 217,7 +217,7 @@ static inline void hinic_sq_wqe_cpu_to_be32(void *data, > int nr_wqebb) > wqe_line[3] = _mm_shuffle_epi8(wqe_line[3], shuf_mask); > wqe_line += 4; > } > -#elif defined(__ARM64_NEON__) > +#elif defined(RTE_ARCH_ARM64) > int i; > uint8x16_t *wqe_line = (uint8x16_t *)data; > const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, @@ -237,7 > +237,7 @@ static inline void hinic_sq_wqe_cpu_to_be32(void *data, int > nr_wqebb) > > static inline void hinic_sge_cpu_to_be32(void *data, int nr_sge) { -#if > defined(__X86_64_SSE__) > +#if defined(RTE_ARCH_X86_64) > int i; > __m128i *sge_line = (__m128i *)data; > __m128i shuf_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, @@ - > 248,7 +248,7 @@ static inline void hinic_sge_cpu_to_be32(void *data, int > nr_sge) > *sge_line = _mm_shuffle_epi8(*sge_line, shuf_mask); > sge_line++; > } > -#elif defined(__ARM64_NEON__) > +#elif defined(RTE_ARCH_ARM64) > int i; > uint8x16_t *sge_line = (uint8x16_t *)data; > const uint8x16_t shuf_mask = {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, > -- > 2.23.0
Re: [dpdk-dev] [EXT] Re: input port in mbuf
Hi, Sorry for rearising this issue again, please check my comments inline Liron Himi -Original Message- From: Stephen Hemminger Sent: Wednesday, 6 May 2020 23:24 To: Liron Himi Cc: dpdk-dev Subject: Re: [EXT] Re: [dpdk-dev] input port in mbuf On Wed, 6 May 2020 20:17:20 + Liron Himi wrote: > For performance optimizations, we need to know the input DPDK port as after > the buffer was transmitted via our ethdev driver instead of release it back > to the memory-pool we can release it to the originated HW pool of the input > port. But you can't be sure where the mbuf came from. It could be a receive on any vendors driver, or it could be from a private pool that is used for transmit, or anywhere. [L.H.] I'm only referring to PP2->PP2 flow on an Armada platform. For any other flow the transmitted buffer will be returned to its 'mb'. Please reconsider the real nature here; the world is not testpmd, l2fwd, l3fwd etc. These are the kind of optimizations that break real applications and cause more trouble than the benefit for one silly benchmark. [L.H.] I don't want to influence application usage, this is why I asked if there is a location in the mbuf where a driver can put its own info. Like the private area for the application, but just for the input driver. And if there is no such location right now, will it be acceptable to introduce such one?