[dpdk-dev] A (possible) problem with `--no-huge` option
Hi guys, I have a problem while running DPDK with `--no-huge` option. It seems that the problem occurs since commit cdc242f260e766bd95a658b5e0686a62ec04f5b0 and that is the change that affects me: + if ((page & 0x7fULL) == 0) + return RTE_BAD_PHYS_ADDR; + What I did is to try to create memory pool using rte_pktmbuf_pool_create(). I dig into the issue and found that in my case “page" value is 0x0080 which means that the page is not present and “soft-dirty” (according to kernel’s documentation): * Bits 0-54 page frame number (PFN) if present * Bits 0-4 swap type if swapped * Bits 5-54 swap offset if swapped * Bit 55pte is soft-dirty (see Documentation/vm/soft-dirty.txt) * Bit 56page exclusively mapped (since 4.2) * Bits 57-60 zero * Bit 61page is file-page or shared-anon (since 3.5) * Bit 62page swapped * Bit 63page present So, before the change mentioned all “works” fine and such pages were not handled. But now the check causes rte_mempool_populate_default to fail with -EINVAL... Can anyone familiar with the memory pool allocation helps with the issue? Thanks in advice, Ilya Matveychikov.
Re: [dpdk-dev] [PATCH] driver/net: remove unnecessary macro for unused variables
Hi, > -Original Message- > From: Yigit, Ferruh > Sent: Friday, May 12, 2017 6:33 PM > To: John W. Linville; Legacy, Allain (Wind River); Peters, Matt (Wind River); > Harish Patil; Rasesh Mody; Stephen Hurd; Ajit Khaparde; Doherty, Declan; Lu, > Wenzhuo; Marcin Wojtas; Michal Krawczyk; Guy Tzalik; Evgeny Schemeilin; > John Daley; Nelson Escobar; Chen, Jing D; Zhang, Helin; Wu, Jingjing; Ananyev, > Konstantin; Andrew Rybchenko; Pascal Mazon; Yuanhan Liu; Maxime > Coquelin; Shrikrishna Khare > Cc: dev@dpdk.org; Yigit, Ferruh > Subject: [PATCH] driver/net: remove unnecessary macro for unused variables > > remove __rte_unused instances that are not required. > > Signed-off-by: Ferruh Yigit Acked-by: Wenzhuo Lu
Re: [dpdk-dev] [PATCH v5 2/4] eal: move gcc version definition to common header
On 12 May 2017 at 18:15, Ashwin Sekhar T K wrote: > Moved the definition of GCC_VERSION from lib/librte_table/rte_lru.h > to lib/librte_eal/common/include/rte_common.h. > > Tested compilation on: > * arm64 with gcc > * x86 with gcc and clang > > Signed-off-by: Ashwin Sekhar T K > Reviewed-by: Jan Viktorin > --- > lib/librte_eal/common/include/rte_common.h | 6 ++ > lib/librte_table/rte_lru.h | 10 ++ > 2 files changed, 8 insertions(+), 8 deletions(-) > Acked-by: Jianbo Liu
Re: [dpdk-dev] [PATCH v5 3/4] net: add arm64 neon version of CRC compute APIs
On 12 May 2017 at 18:15, Ashwin Sekhar T K wrote: > Added CRC compute APIs for arm64 utilizing the pmull > capability. > > Added new file net_crc_neon.h to hold the arm64 pmull > CRC implementation. > > Added wrappers in rte_vect.h for those neon intrinsics > which are not supported in GCC version < 7. > > Verified the changes with crc_autotest unit test case > > Signed-off-by: Ashwin Sekhar T K > --- > MAINTAINERS | 1 + > lib/librte_eal/common/include/arch/arm/rte_vect.h | 88 +++ > lib/librte_net/net_crc_neon.h | 297 > ++ > lib/librte_net/rte_net_crc.c | 34 ++- > lib/librte_net/rte_net_crc.h | 2 + > 5 files changed, 416 insertions(+), 6 deletions(-) > create mode 100644 lib/librte_net/net_crc_neon.h > Acked-by: Jianbo Liu
Re: [dpdk-dev] [PATCH] driver/net: remove unnecessary macro for unused variables
On Fri, May 12, 2017 at 11:33:03AM +0100, Ferruh Yigit wrote: > remove __rte_unused instances that are not required. I'm wondering this is done by some scripts? --yliu
[dpdk-dev] [RFC 2/2] doc/guides/prog_guide: add new flow attribute
Update the programming guide for the new attribute of rte_flow Signed-off-by: Qi Zhang --- doc/guides/prog_guide/rte_flow.rst | 12 1 file changed, 12 insertions(+) diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst index b587ba9..5207eec 100644 --- a/doc/guides/prog_guide/rte_flow.rst +++ b/doc/guides/prog_guide/rte_flow.rst @@ -181,6 +181,18 @@ directions. At least one direction must be specified. Specifying both directions at once for a given rule is not recommended but may be valid in a few cases (e.g. shared counters). +Attribute: Match hint +^ + +This is a attribute to hint different pattern match accuracy. + +Perfect match: +- Actions will be taken if input packet's pattern matches flow's pattern. + +Signature match: +- Actions will be taken if the signature of input packet's pattern matches + the signature of flow's pattern. + Pattern item -- 2.7.4
[dpdk-dev] [RFC 1/2] rte_flow: add attribute for signature match
Add new attribute "sig_match" to rte_flow_attr. This attribute indicate if current flow take "perfect match" or "signature match". With perfect match (by default), if a packet does not match pattern, actions will not be taken. (this is identical with current behavior ) With signature match, if a packet does not match pattern, it still has the possibility to trigger the actions, this happens when device think the signature of the pattern is matched. Signature match is expected to have better performance than perfect match, but the cost is accuracy. When a flow rule with this attribute set, identical behavior can ONLY be guaranteed if packet matches the pattern, since different device may have different implementation of signature calculation algorithm. Signed-off-by: Qi Zhang --- app/test-pmd/cmdline_flow.c | 11 +++ lib/librte_ether/rte_flow.h | 3 ++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 0fd69f9..512f817 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -95,6 +95,7 @@ enum index { PRIORITY, INGRESS, EGRESS, + SIG_MATCH, /* Validate/create pattern. */ PATTERN, @@ -397,6 +398,7 @@ static const enum index next_vc_attr[] = { PRIORITY, INGRESS, EGRESS, + SIG_MATCH, PATTERN, ZERO, }; @@ -896,6 +898,12 @@ static const struct token token_list[] = { .next = NEXT(next_vc_attr), .call = parse_vc, }, + [SIG_MATCH] = { + .name = "sig_match", + .help = "affect rule to match", + .next = NEXT(next_vc_attr), + .call = parse_vc, + }, /* Validate/create pattern. */ [PATTERN] = { .name = "pattern", @@ -1728,6 +1736,9 @@ parse_vc(struct context *ctx, const struct token *token, case EGRESS: out->args.vc.attr.egress = 1; return len; + case SIG_MATCH: + out->args.vc.attr.sig_match = 1; + return len; case PATTERN: out->args.vc.pattern = (void *)RTE_ALIGN_CEIL((uintptr_t)(out + 1), diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h index c47edbc..8ba3c36 100644 --- a/lib/librte_ether/rte_flow.h +++ b/lib/librte_ether/rte_flow.h @@ -95,7 +95,8 @@ struct rte_flow_attr { uint32_t priority; /**< Priority level within group. */ uint32_t ingress:1; /**< Rule applies to ingress traffic. */ uint32_t egress:1; /**< Rule applies to egress traffic. */ - uint32_t reserved:30; /**< Reserved, must be zero. */ + uint32_t sig_match:1; /**< only use hash signagure to match. */ + uint32_t reserved:29; /**< Reserved, must be zero. */ }; /** -- 2.7.4
[dpdk-dev] [RFC 0/2] ethdev: add new attribute for signature match
We try to enable ixgbe's signature match with rte_flow, but didn't find a way with current APIs, so the RFC propose to add a new flow attribute "sig_match" to indicate if current flow is "perfect match" or "signature match" With perfect match (by default), if a packet does not match pattern, actions will not be taken. (this is identical with current behavior) With signature match, if a packet does not match pattern, it still has the possibility to trigger the actions, this happens when device think the signature of the pattern is matched. Signature match is expected to have better performance than perfect match with the cost of accuracy. When a flow rule with this attribute set, identical behavior can ONLY be guaranteed if packet matches the pattern, since different device may have different implementation of signature calculation algorithm. Driver of device that does not support signature match is not required to return error, but just simply igore this attribute, because the default "perfect match" still can be regarded as a speical case of "signature match". Qi Zhang (2): rte_flow: add attribute for signature match doc/guides/prog_guide: add new rte_flow attribute app/test-pmd/cmdline_flow.c| 11 +++ doc/guides/prog_guide/rte_flow.rst | 12 lib/librte_ether/rte_flow.h| 3 ++- 3 files changed, 25 insertions(+), 1 deletion(-) -- 2.7.4
[dpdk-dev] [PATCH v4 0/8] accelerate examples/l3fwd with NEON on ARM64 platform
v4: - add vcopyq_laneq_u32 for older version of gcc v3: - remove unnecessary perfetch for rte_mbuf - fix typo in git log - Ashwin's suggestions for performance on ThunderX v2: - change name of l3fwd_em_sse.h to l3fwd_em_sequential.h - add the times of hash multi-lookup for different Archs - performance tuning on ThunderX: prefetching, set NO_HASH_LOOKUP_MULTI ... Jianbo Liu (8): examples/l3fwd: extract arch independent code from multi hash lookup examples/l3fwd: rename l3fwd_em_sse.h to l3fwd_em_sequential.h examples/l3fwd: extract common code from multi packet send examples/l3fwd: rearrange the code for lpm_l3fwd arch/arm: add vcopyq_laneq_u32 for old version of gcc examples/l3fwd: add neon support for l3fwd examples/l3fwd: add the times of hash multi-lookup for different Archs examples/l3fwd: change the guard macro name for header file examples/l3fwd/l3fwd_common.h | 293 + examples/l3fwd/l3fwd_em.c | 8 +- examples/l3fwd/l3fwd_em_hlm.h | 218 +++ examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ examples/l3fwd/l3fwd_em_hlm_sse.h | 280 +--- .../{l3fwd_em_sse.h => l3fwd_em_sequential.h} | 24 +- examples/l3fwd/l3fwd_lpm.c | 87 +- examples/l3fwd/l3fwd_lpm.h | 26 +- examples/l3fwd/l3fwd_lpm_neon.h| 193 ++ examples/l3fwd/l3fwd_lpm_sse.h | 66 - examples/l3fwd/l3fwd_neon.h| 259 ++ examples/l3fwd/l3fwd_sse.h | 255 +- lib/librte_eal/common/include/arch/arm/rte_vect.h | 9 + 13 files changed, 1166 insertions(+), 626 deletions(-) create mode 100644 examples/l3fwd/l3fwd_common.h create mode 100644 examples/l3fwd/l3fwd_em_hlm.h create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h rename examples/l3fwd/{l3fwd_em_sse.h => l3fwd_em_sequential.h} (88%) create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h create mode 100644 examples/l3fwd/l3fwd_neon.h -- 1.8.3.1
[dpdk-dev] [PATCH v4 1/8] examples/l3fwd: extract arch independent code from multi hash lookup
Extract common code from l3fwd_em_hlm_sse.h, and add to the new file l3fwd_em_hlm.h. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_em.c | 2 +- examples/l3fwd/l3fwd_em_hlm.h | 302 ++ examples/l3fwd/l3fwd_em_hlm_sse.h | 280 +-- 3 files changed, 309 insertions(+), 275 deletions(-) create mode 100644 examples/l3fwd/l3fwd_em_hlm.h diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 9cc4460..939a16d 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -332,7 +332,7 @@ struct ipv6_l3fwd_em_route { #if defined(NO_HASH_MULTI_LOOKUP) #include "l3fwd_em_sse.h" #else -#include "l3fwd_em_hlm_sse.h" +#include "l3fwd_em_hlm.h" #endif #else #include "l3fwd_em.h" diff --git a/examples/l3fwd/l3fwd_em_hlm.h b/examples/l3fwd/l3fwd_em_hlm.h new file mode 100644 index 000..636dea4 --- /dev/null +++ b/examples/l3fwd/l3fwd_em_hlm.h @@ -0,0 +1,302 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * Copyright(c) 2017, Linaro Limited + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __L3FWD_EM_HLM_H__ +#define __L3FWD_EM_HLM_H__ + +#include "l3fwd_sse.h" +#include "l3fwd_em_hlm_sse.h" + +static inline __attribute__((always_inline)) void +em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], + uint8_t portid, uint16_t dst_port[8]) +{ + int32_t ret[8]; + union ipv4_5tuple_host key[8]; + + get_ipv4_5tuple(m[0], mask0.x, &key[0]); + get_ipv4_5tuple(m[1], mask0.x, &key[1]); + get_ipv4_5tuple(m[2], mask0.x, &key[2]); + get_ipv4_5tuple(m[3], mask0.x, &key[3]); + get_ipv4_5tuple(m[4], mask0.x, &key[4]); + get_ipv4_5tuple(m[5], mask0.x, &key[5]); + get_ipv4_5tuple(m[6], mask0.x, &key[6]); + get_ipv4_5tuple(m[7], mask0.x, &key[7]); + + const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3], + &key[4], &key[5], &key[6], &key[7]}; + + rte_hash_lookup_bulk(qconf->ipv4_lookup_struct, &key_array[0], 8, ret); + + dst_port[0] = (uint8_t) ((ret[0] < 0) ? + portid : ipv4_l3fwd_out_if[ret[0]]); + dst_port[1] = (uint8_t) ((ret[1] < 0) ? + portid : ipv4_l3fwd_out_if[ret[1]]); + dst_port[2] = (uint8_t) ((ret[2] < 0) ? + portid : ipv4_l3fwd_out_if[ret[2]]); + dst_port[3] = (uint8_t) ((ret[3] < 0) ? + portid : ipv4_l3fwd_out_if[ret[3]]); + dst_port[4] = (uint8_t) ((ret[4] < 0) ? + portid : ipv4_l3fwd_out_if[ret[4]]); + dst_port[5] = (uint8_t) ((ret[5] < 0) ? + portid : ipv4_l3fwd_out_if[ret[5]]); + dst_port[6] = (uint8_t) ((ret[6] < 0) ? + portid : ipv4_l3fwd_out_if[ret[6]]); + dst_port[7] = (uint8_t) ((ret[7] < 0) ? + portid : ipv4_l3fwd_out_if[ret[7]]); + + if (dst_port[0] >= RTE_MAX_ETHPORTS || + (enabled_port_mask & 1 << dst_port[0]) == 0) + dst_port[0] = portid; + + if (dst_port[1] >= RTE_MAX_ETHPORTS || + (enabled_port_mask & 1 << dst_port[1]) == 0) + dst_port[1] = portid; + + if (dst_port[2] >= RTE_MAX_ETHPORTS || + (enabled_port_m
[dpdk-dev] [PATCH v4 2/8] examples/l3fwd: rename l3fwd_em_sse.h to l3fwd_em_sequential.h
The l3fwd_em_sse.h is enabled by NO_HASH_LOOKUP_MULTI. Renaming it because it's only for sequential hash lookup, and doesn't include any x86 SSE instructions. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_em.c| 2 +- examples/l3fwd/{l3fwd_em_sse.h => l3fwd_em_sequential.h} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename examples/l3fwd/{l3fwd_em_sse.h => l3fwd_em_sequential.h} (100%) diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 939a16d..ba844b2 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -330,7 +330,7 @@ struct ipv6_l3fwd_em_route { #if defined(__SSE4_1__) #if defined(NO_HASH_MULTI_LOOKUP) -#include "l3fwd_em_sse.h" +#include "l3fwd_em_sequential.h" #else #include "l3fwd_em_hlm.h" #endif diff --git a/examples/l3fwd/l3fwd_em_sse.h b/examples/l3fwd/l3fwd_em_sequential.h similarity index 100% rename from examples/l3fwd/l3fwd_em_sse.h rename to examples/l3fwd/l3fwd_em_sequential.h -- 1.8.3.1
[dpdk-dev] [PATCH v4 3/8] examples/l3fwd: extract common code from multi packet send
Keep x86 related code in l3fwd_sse.h, and move common code to l3fwd_common.h, which will be used by other Archs. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_common.h | 293 ++ examples/l3fwd/l3fwd_sse.h| 255 +--- 2 files changed, 297 insertions(+), 251 deletions(-) create mode 100644 examples/l3fwd/l3fwd_common.h diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h new file mode 100644 index 000..d7a1fdf --- /dev/null +++ b/examples/l3fwd/l3fwd_common.h @@ -0,0 +1,293 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * Copyright(c) 2017, Linaro Limited + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + + +#ifndef _L3FWD_COMMON_H_ +#define _L3FWD_COMMON_H_ + +#ifdef DO_RFC_1812_CHECKS + +#defineIPV4_MIN_VER_IHL0x45 +#defineIPV4_MAX_VER_IHL0x4f +#defineIPV4_MAX_VER_IHL_DIFF (IPV4_MAX_VER_IHL - IPV4_MIN_VER_IHL) + +/* Minimum value of IPV4 total length (20B) in network byte order. */ +#defineIPV4_MIN_LEN_BE (sizeof(struct ipv4_hdr) << 8) + +/* + * From http://www.rfc-editor.org/rfc/rfc1812.txt section 5.2.2: + * - The IP version number must be 4. + * - The IP header length field must be large enough to hold the + *minimum length legal IP datagram (20 bytes = 5 words). + * - The IP total length field must be large enough to hold the IP + * datagram header, whose length is specified in the IP header length + * field. + * If we encounter invalid IPV4 packet, then set destination port for it + * to BAD_PORT value. + */ +static inline __attribute__((always_inline)) void +rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype) +{ + uint8_t ihl; + + if (RTE_ETH_IS_IPV4_HDR(ptype)) { + ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL; + + ipv4_hdr->time_to_live--; + ipv4_hdr->hdr_checksum++; + + if (ihl > IPV4_MAX_VER_IHL_DIFF || + ((uint8_t)ipv4_hdr->total_length == 0 && + ipv4_hdr->total_length < IPV4_MIN_LEN_BE)) + dp[0] = BAD_PORT; + + } +} + +#else +#definerfc1812_process(mb, dp, ptype) do { } while (0) +#endif /* DO_RFC_1812_CHECKS */ + +/* + * We group consecutive packets with the same destionation port into one burst. + * To avoid extra latency this is done together with some other packet + * processing, but after we made a final decision about packet's destination. + * To do this we maintain: + * pnum - array of number of consecutive packets with the same dest port for + * each packet in the input burst. + * lp - pointer to the last updated element in the pnum. + * dlp - dest port value lp corresponds to. + */ + +#defineGRPSZ (1 << FWDSTEP) +#defineGRPMSK (GRPSZ - 1) + +#define GROUP_PORT_STEP(dlp, dcp, lp, pn, idx) do { \ + if (likely((dlp) == (dcp)[(idx)])) { \ + (lp)[0]++; \ + } else { \ + (dlp) = (dcp)[idx]; \ + (lp) = (pn) + (idx); \ + (lp)[0] = 1; \ + }\ +}
[dpdk-dev] [PATCH v4 5/8] arch/arm: add vcopyq_laneq_u32 for old version of gcc
Implement vcopyq_laneq_u32 if gcc version is lower than 7. Signed-off-by: Jianbo Liu --- lib/librte_eal/common/include/arch/arm/rte_vect.h | 9 + 1 file changed, 9 insertions(+) diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h index 4107c99..d9fb4d0 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h @@ -78,6 +78,15 @@ } #endif +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 7) +static inline uint32x4_t +vcopyq_laneq_u32(uint32x4_t a, const int lane_a, +uint32x4_t b, const int lane_b) +{ + return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); +} +#endif + #ifdef __cplusplus } #endif -- 1.8.3.1
[dpdk-dev] [PATCH v4 6/8] examples/l3fwd: add neon support for l3fwd
Use ARM NEON intrinsics to accelerate l3 fowarding. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_em.c| 4 +- examples/l3fwd/l3fwd_em_hlm.h| 17 ++- examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ examples/l3fwd/l3fwd_em_sequential.h | 18 ++- examples/l3fwd/l3fwd_lpm.c | 4 +- examples/l3fwd/l3fwd_lpm_neon.h | 193 ++ examples/l3fwd/l3fwd_neon.h | 259 +++ 7 files changed, 563 insertions(+), 6 deletions(-) create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h create mode 100644 examples/l3fwd/l3fwd_neon.h diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index ba844b2..da96cfd 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -328,7 +328,7 @@ struct ipv6_l3fwd_em_route { return (uint8_t)((ret < 0) ? portid : ipv6_l3fwd_out_if[ret]); } -#if defined(__SSE4_1__) +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) #if defined(NO_HASH_MULTI_LOOKUP) #include "l3fwd_em_sequential.h" #else @@ -709,7 +709,7 @@ struct ipv6_l3fwd_em_route { if (nb_rx == 0) continue; -#if defined(__SSE4_1__) +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) l3fwd_em_send_packets(nb_rx, pkts_burst, portid, qconf); #else diff --git a/examples/l3fwd/l3fwd_em_hlm.h b/examples/l3fwd/l3fwd_em_hlm.h index 636dea4..b9163e3 100644 --- a/examples/l3fwd/l3fwd_em_hlm.h +++ b/examples/l3fwd/l3fwd_em_hlm.h @@ -35,8 +35,13 @@ #ifndef __L3FWD_EM_HLM_H__ #define __L3FWD_EM_HLM_H__ +#if defined(__SSE4_1__) #include "l3fwd_sse.h" #include "l3fwd_em_hlm_sse.h" +#elif defined(RTE_MACHINE_CPUFLAG_NEON) +#include "l3fwd_neon.h" +#include "l3fwd_em_hlm_neon.h" +#endif static inline __attribute__((always_inline)) void em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], @@ -238,7 +243,7 @@ static inline __attribute__((always_inline)) uint16_t l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, uint8_t portid, struct lcore_conf *qconf) { - int32_t j; + int32_t i, j, pos; uint16_t dst_port[MAX_PKT_BURST]; /* @@ -247,6 +252,11 @@ static inline __attribute__((always_inline)) uint16_t */ int32_t n = RTE_ALIGN_FLOOR(nb_rx, 8); + for (j = 0; j < 8 && j < nb_rx; j++) { + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], + struct ether_hdr *) + 1); + } + for (j = 0; j < n; j += 8) { uint32_t pkt_type = @@ -263,6 +273,11 @@ static inline __attribute__((always_inline)) uint16_t uint32_t tcp_or_udp = pkt_type & (RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP); + for (i = 0, pos = j + 8; i < 8 && pos < nb_rx; i++, pos++) { + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[pos], + struct ether_hdr *) + 1); + } + if (tcp_or_udp && (l3_type == RTE_PTYPE_L3_IPV4)) { em_get_dst_port_ipv4x8(qconf, &pkts_burst[j], portid, diff --git a/examples/l3fwd/l3fwd_em_hlm_neon.h b/examples/l3fwd/l3fwd_em_hlm_neon.h new file mode 100644 index 000..dae1acf --- /dev/null +++ b/examples/l3fwd/l3fwd_em_hlm_neon.h @@ -0,0 +1,74 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * Copyright(c) 2017, Linaro Limited + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICE
[dpdk-dev] [PATCH v4 4/8] examples/l3fwd: rearrange the code for lpm_l3fwd
Signed-off-by: Jianbo Liu Some common code can be used by other ARCHs, move to l3fwd_lpm.c --- examples/l3fwd/l3fwd_lpm.c | 83 ++ examples/l3fwd/l3fwd_lpm.h | 26 + examples/l3fwd/l3fwd_lpm_sse.h | 66 - 3 files changed, 84 insertions(+), 91 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index f621269..fc554fc 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -104,6 +104,89 @@ struct ipv6_l3fwd_lpm_route { struct rte_lpm *ipv4_l3fwd_lpm_lookup_struct[NB_SOCKETS]; struct rte_lpm6 *ipv6_l3fwd_lpm_lookup_struct[NB_SOCKETS]; +static inline uint16_t +lpm_get_ipv4_dst_port(void *ipv4_hdr, uint8_t portid, void *lookup_struct) +{ + uint32_t next_hop; + struct rte_lpm *ipv4_l3fwd_lookup_struct = + (struct rte_lpm *)lookup_struct; + + return (uint16_t) ((rte_lpm_lookup(ipv4_l3fwd_lookup_struct, + rte_be_to_cpu_32(((struct ipv4_hdr *)ipv4_hdr)->dst_addr), + &next_hop) == 0) ? next_hop : portid); +} + +static inline uint16_t +lpm_get_ipv6_dst_port(void *ipv6_hdr, uint8_t portid, void *lookup_struct) +{ + uint32_t next_hop; + struct rte_lpm6 *ipv6_l3fwd_lookup_struct = + (struct rte_lpm6 *)lookup_struct; + + return (uint16_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct, + ((struct ipv6_hdr *)ipv6_hdr)->dst_addr, + &next_hop) == 0) ? next_hop : portid); +} + +static inline __attribute__((always_inline)) uint16_t +lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt, + uint8_t portid) +{ + struct ipv6_hdr *ipv6_hdr; + struct ipv4_hdr *ipv4_hdr; + struct ether_hdr *eth_hdr; + + if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) { + + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); + + return lpm_get_ipv4_dst_port(ipv4_hdr, portid, +qconf->ipv4_lookup_struct); + } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) { + + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); + ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1); + + return lpm_get_ipv6_dst_port(ipv6_hdr, portid, +qconf->ipv6_lookup_struct); + } + + return portid; +} + +/* + * lpm_get_dst_port optimized routine for packets where dst_ipv4 is already + * precalculated. If packet is ipv6 dst_addr is taken directly from packet + * header and dst_ipv4 value is not used. + */ +static inline __attribute__((always_inline)) uint16_t +lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt, + uint32_t dst_ipv4, uint8_t portid) +{ + uint32_t next_hop; + struct ipv6_hdr *ipv6_hdr; + struct ether_hdr *eth_hdr; + + if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) { + return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, + dst_ipv4, &next_hop) == 0) + ? next_hop : portid); + + } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) { + + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); + ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1); + + return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct, + ipv6_hdr->dst_addr, &next_hop) == 0) + ? next_hop : portid); + + } + + return portid; +} + #if defined(__SSE4_1__) #include "l3fwd_lpm_sse.h" #else diff --git a/examples/l3fwd/l3fwd_lpm.h b/examples/l3fwd/l3fwd_lpm.h index 258a82f..4865d90 100644 --- a/examples/l3fwd/l3fwd_lpm.h +++ b/examples/l3fwd/l3fwd_lpm.h @@ -34,37 +34,13 @@ #ifndef __L3FWD_LPM_H__ #define __L3FWD_LPM_H__ -static inline uint8_t -lpm_get_ipv4_dst_port(void *ipv4_hdr, uint8_t portid, void *lookup_struct) -{ - uint32_t next_hop; - struct rte_lpm *ipv4_l3fwd_lookup_struct = - (struct rte_lpm *)lookup_struct; - - return (uint8_t) ((rte_lpm_lookup(ipv4_l3fwd_lookup_struct, - rte_be_to_cpu_32(((struct ipv4_hdr *)ipv4_hdr)->dst_addr), - &next_hop) == 0) ? next_hop : portid); -} - -static inline uint8_t -lpm_get_ipv6_dst_port(void *ipv6_hdr, uint8_t portid, void *lookup_struct) -{ - uint32_t next_hop; - struct rte_lpm6 *ipv6_l3fwd_lookup_struct = - (struct rte_lpm6 *)lookup_struct; - - return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct, - ((struct ipv6_hdr *)ipv6_hdr)->dst_addr, - &next_hop) == 0) ? next_hop : portid); -} - static inline __attribute__((always_inline)) void l3fwd_lpm_simple_forward(struct rte_mbuf *m, uint8_t port
[dpdk-dev] [PATCH v4 7/8] examples/l3fwd: add the times of hash multi-lookup for different Archs
New macro to define how many times of hash lookup in one time, and this makes the code more concise. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_em_hlm.h | 241 +- 1 file changed, 71 insertions(+), 170 deletions(-) diff --git a/examples/l3fwd/l3fwd_em_hlm.h b/examples/l3fwd/l3fwd_em_hlm.h index b9163e3..098b396 100644 --- a/examples/l3fwd/l3fwd_em_hlm.h +++ b/examples/l3fwd/l3fwd_em_hlm.h @@ -43,148 +43,65 @@ #include "l3fwd_em_hlm_neon.h" #endif +#ifdef RTE_ARCH_ARM64 +#define EM_HASH_LOOKUP_COUNT 16 +#else +#define EM_HASH_LOOKUP_COUNT 8 +#endif + + static inline __attribute__((always_inline)) void -em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], - uint8_t portid, uint16_t dst_port[8]) +em_get_dst_port_ipv4xN(struct lcore_conf *qconf, struct rte_mbuf *m[], + uint8_t portid, uint16_t dst_port[]) { - int32_t ret[8]; - union ipv4_5tuple_host key[8]; - - get_ipv4_5tuple(m[0], mask0.x, &key[0]); - get_ipv4_5tuple(m[1], mask0.x, &key[1]); - get_ipv4_5tuple(m[2], mask0.x, &key[2]); - get_ipv4_5tuple(m[3], mask0.x, &key[3]); - get_ipv4_5tuple(m[4], mask0.x, &key[4]); - get_ipv4_5tuple(m[5], mask0.x, &key[5]); - get_ipv4_5tuple(m[6], mask0.x, &key[6]); - get_ipv4_5tuple(m[7], mask0.x, &key[7]); - - const void *key_array[8] = {&key[0], &key[1], &key[2], &key[3], - &key[4], &key[5], &key[6], &key[7]}; - - rte_hash_lookup_bulk(qconf->ipv4_lookup_struct, &key_array[0], 8, ret); - - dst_port[0] = (uint8_t) ((ret[0] < 0) ? - portid : ipv4_l3fwd_out_if[ret[0]]); - dst_port[1] = (uint8_t) ((ret[1] < 0) ? - portid : ipv4_l3fwd_out_if[ret[1]]); - dst_port[2] = (uint8_t) ((ret[2] < 0) ? - portid : ipv4_l3fwd_out_if[ret[2]]); - dst_port[3] = (uint8_t) ((ret[3] < 0) ? - portid : ipv4_l3fwd_out_if[ret[3]]); - dst_port[4] = (uint8_t) ((ret[4] < 0) ? - portid : ipv4_l3fwd_out_if[ret[4]]); - dst_port[5] = (uint8_t) ((ret[5] < 0) ? - portid : ipv4_l3fwd_out_if[ret[5]]); - dst_port[6] = (uint8_t) ((ret[6] < 0) ? - portid : ipv4_l3fwd_out_if[ret[6]]); - dst_port[7] = (uint8_t) ((ret[7] < 0) ? - portid : ipv4_l3fwd_out_if[ret[7]]); - - if (dst_port[0] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[0]) == 0) - dst_port[0] = portid; - - if (dst_port[1] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[1]) == 0) - dst_port[1] = portid; - - if (dst_port[2] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[2]) == 0) - dst_port[2] = portid; - - if (dst_port[3] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[3]) == 0) - dst_port[3] = portid; - - if (dst_port[4] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[4]) == 0) - dst_port[4] = portid; - - if (dst_port[5] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[5]) == 0) - dst_port[5] = portid; - - if (dst_port[6] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[6]) == 0) - dst_port[6] = portid; - - if (dst_port[7] >= RTE_MAX_ETHPORTS || - (enabled_port_mask & 1 << dst_port[7]) == 0) - dst_port[7] = portid; + int i; + int32_t ret[EM_HASH_LOOKUP_COUNT]; + union ipv4_5tuple_host key[EM_HASH_LOOKUP_COUNT]; + const void *key_array[EM_HASH_LOOKUP_COUNT]; + + for (i = 0; i < EM_HASH_LOOKUP_COUNT; i++) { + get_ipv4_5tuple(m[i], mask0.x, &key[i]); + key_array[i] = &key[i]; + } + + rte_hash_lookup_bulk(qconf->ipv4_lookup_struct, &key_array[0], +EM_HASH_LOOKUP_COUNT, ret); + + for (i = 0; i < EM_HASH_LOOKUP_COUNT; i++) { + dst_port[i] = (uint8_t) ((ret[i] < 0) ? + portid : ipv4_l3fwd_out_if[ret[i]]); + if (dst_port[i] >= RTE_MAX_ETHPORTS || + (enabled_port_mask & 1 << dst_port[i]) == 0) + dst_port[i] = portid; + } } static inline __attribute__((always_inline)) void -em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], - uint8_t portid, uint16_t dst_port[8]) +em_get_dst_port_ipv6xN(struct lcore_conf *qconf, struct rte_mbuf *m[], + uint8_t portid, uint16_t dst_port[]) { - int32_t ret[8]; - union ipv6_5tuple_host key[8]; - - get_ipv6_5tuple(m[0], mask1.x, mask2.x, &key[0]); - get
[dpdk-dev] [PATCH v4 8/8] examples/l3fwd: change the guard macro name for header file
As l3fwd_em_sse.h is renamed to l3fwd_em_sequential.h, change the macro to __L3FWD_EM_SEQUENTIAL_H__ to maintain consistency. Signed-off-by: Jianbo Liu --- examples/l3fwd/l3fwd_em_sequential.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/examples/l3fwd/l3fwd_em_sequential.h b/examples/l3fwd/l3fwd_em_sequential.h index 2b3ec16..c7d477d 100644 --- a/examples/l3fwd/l3fwd_em_sequential.h +++ b/examples/l3fwd/l3fwd_em_sequential.h @@ -31,8 +31,8 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#ifndef __L3FWD_EM_SSE_H__ -#define __L3FWD_EM_SSE_H__ +#ifndef __L3FWD_EM_SEQUENTIAL_H__ +#define __L3FWD_EM_SEQUENTIAL_H__ /** * @file @@ -123,4 +123,4 @@ static inline __attribute__((always_inline)) uint16_t send_packets_multi(qconf, pkts_burst, dst_port, nb_rx); } -#endif /* __L3FWD_EM_SSE_H__ */ +#endif /* __L3FWD_EM_SEQUENTIAL_H__ */ -- 1.8.3.1
Re: [dpdk-dev] [PATCH] event/sw: add queue-to-port stats
-Original Message- > Date: Thu, 11 May 2017 10:56:26 +0100 > From: Harry van Haaren > To: dev@dpdk.org > CC: jerin.ja...@caviumnetworks.com, Harry van Haaren > > Subject: [PATCH] event/sw: add queue-to-port stats > X-Mailer: git-send-email 2.7.4 > > This patch targets the next-eventdev tree. > > This commit adds a new statistic to the SW eventdev PMD. > The statistic shows how many packets were sent from a > queue to a port. This provides information on how traffic > from a specific queue is being load-balanced to worker cores. > > Note that these numbers should be compared across all queue > stages - the load-balancing does not try to perfectly share > each queue's traffic, rather it balances the overall traffic > from all queues to the ports. > > The statistic is printed from the rte_eventdev_dump() function, > as well as being made available via the xstats API. > > Unit tests have been updated to expect more per-queue statistics, > and the correctness of counts and counts after reset is verified. > > Signed-off-by: Harry van Haaren Applied to dpdk-next-eventdev/master after removing "This patch targets the next-eventdev tree." in the git commit log. Thanks.
Re: [dpdk-dev] [PATCH] eventdev: clarify atomic and ordered queue config
-Original Message- > Date: Fri, 12 May 2017 14:25:37 -0500 > From: Gage Eads > To: dev@dpdk.org > CC: jerin.ja...@caviumnetworks.org > Subject: [dpdk-dev] [PATCH] eventdev: clarify atomic and ordered queue > config > X-Mailer: git-send-email 2.7.4 > > The nb_atomic_flows and nb_atomic_order_sequences fields are only inspected > if the queue is configured for atomic or ordered scheduling, respectively. > This commit updates the documentation to reflect that. > > Signed-off-by: Gage Eads > --- > lib/librte_eventdev/rte_eventdev.h | 15 ++- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/lib/librte_eventdev/rte_eventdev.h > b/lib/librte_eventdev/rte_eventdev.h > index 20e7293..32ffcd1 100644 > --- a/lib/librte_eventdev/rte_eventdev.h > +++ b/lib/librte_eventdev/rte_eventdev.h > @@ -521,9 +521,11 @@ rte_event_dev_configure(uint8_t dev_id, > struct rte_event_queue_conf { > uint32_t nb_atomic_flows; > /**< The maximum number of active flows this queue can track at any > - * given time. The value must be in the range of > - * [1 - nb_event_queue_flows)] which previously provided in > - * rte_event_dev_info_get(). > + * given time. If the queue is configured for atomic scheduling (by > + * applying the RTE_EVENT_QUEUE_CFG_ALL_TYPES or > + * RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY flags to event_queue_cfg), then the > + * value must be in the range of [1 - nb_event_queue_flows)], which was > + * previously provided in rte_event_dev_configure(). >*/ > uint32_t nb_atomic_order_sequences; > /**< The maximum number of outstanding events waiting to be > @@ -533,8 +535,11 @@ struct rte_event_queue_conf { >* scheduler cannot schedule the events from this queue and invalid >* event will be returned from dequeue until one or more entries are >* freed up/released. > - * The value must be in the range of [1 - nb_event_queue_flows)] > - * which previously supplied to rte_event_dev_configure(). > + * If the queue is configured for ordered scheduling (by applying the > + * RTE_EVENT_QUEUE_CFG_ALL_TYPES or RTE_EVENT_QUEUE_CFG_ORDERED_ONLY > + * flags to event_queue_cfg), then the value must be in the range of [1 > + * - nb_event_queue_flows)], which was previously supplied to At this line, HTML document rendering is not showing up correctly. Please check the generated HTML output with "make doc-api-html" Other than that, content looks OK. > + * rte_event_dev_configure(). >*/ > uint32_t event_queue_cfg; /**< Queue cfg flags(EVENT_QUEUE_CFG_) */ > uint8_t priority; > -- > 2.7.4 >
Re: [dpdk-dev] [PATCH v4 5/8] arch/arm: add vcopyq_laneq_u32 for old version of gcc
-Original Message- > Date: Mon, 15 May 2017 11:34:53 +0800 > From: Jianbo Liu > To: dev@dpdk.org, tomasz.kante...@intel.com, > jerin.ja...@caviumnetworks.com, ashwin.sek...@caviumnetworks.com > CC: Jianbo Liu > Subject: [PATCH v4 5/8] arch/arm: add vcopyq_laneq_u32 for old version of > gcc > X-Mailer: git-send-email 1.8.3.1 > > Implement vcopyq_laneq_u32 if gcc version is lower than 7. > > Signed-off-by: Jianbo Liu Acked-by: Jerin Jacob > --- > lib/librte_eal/common/include/arch/arm/rte_vect.h | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h > b/lib/librte_eal/common/include/arch/arm/rte_vect.h > index 4107c99..d9fb4d0 100644 > --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h > +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h > @@ -78,6 +78,15 @@ > } > #endif > > +#if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 7) > +static inline uint32x4_t > +vcopyq_laneq_u32(uint32x4_t a, const int lane_a, > + uint32x4_t b, const int lane_b) > +{ > + return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); > +} > +#endif > + > #ifdef __cplusplus > } > #endif > -- > 1.8.3.1 >
Re: [dpdk-dev] [PATCH v4 6/8] examples/l3fwd: add neon support for l3fwd
On Mon, 2017-05-15 at 11:34 +0800, Jianbo Liu wrote: > Use ARM NEON intrinsics to accelerate l3 fowarding. > > Signed-off-by: Jianbo Liu Acked-by: Ashwin Sekhar T K > --- > examples/l3fwd/l3fwd_em.c| 4 +- > examples/l3fwd/l3fwd_em_hlm.h| 17 ++- > examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ > examples/l3fwd/l3fwd_em_sequential.h | 18 ++- > examples/l3fwd/l3fwd_lpm.c | 4 +- > examples/l3fwd/l3fwd_lpm_neon.h | 193 > ++ > examples/l3fwd/l3fwd_neon.h | 259 > +++ > 7 files changed, 563 insertions(+), 6 deletions(-) > create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h > create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h > create mode 100644 examples/l3fwd/l3fwd_neon.h > > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c > index ba844b2..da96cfd 100644 > --- a/examples/l3fwd/l3fwd_em.c > +++ b/examples/l3fwd/l3fwd_em.c > @@ -328,7 +328,7 @@ struct ipv6_l3fwd_em_route { > return (uint8_t)((ret < 0) ? portid : > ipv6_l3fwd_out_if[ret]); > } > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > #if defined(NO_HASH_MULTI_LOOKUP) > #include "l3fwd_em_sequential.h" > #else > @@ -709,7 +709,7 @@ struct ipv6_l3fwd_em_route { > if (nb_rx == 0) > continue; > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > l3fwd_em_send_packets(nb_rx, pkts_burst, > portid, > qconf); > #else > diff --git a/examples/l3fwd/l3fwd_em_hlm.h > b/examples/l3fwd/l3fwd_em_hlm.h > index 636dea4..b9163e3 100644 > --- a/examples/l3fwd/l3fwd_em_hlm.h > +++ b/examples/l3fwd/l3fwd_em_hlm.h > @@ -35,8 +35,13 @@ > #ifndef __L3FWD_EM_HLM_H__ > #define __L3FWD_EM_HLM_H__ > > +#if defined(__SSE4_1__) > #include "l3fwd_sse.h" > #include "l3fwd_em_hlm_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > +#include "l3fwd_neon.h" > +#include "l3fwd_em_hlm_neon.h" > +#endif > > static inline __attribute__((always_inline)) void > em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf > *m[8], > @@ -238,7 +243,7 @@ static inline __attribute__((always_inline)) > uint16_t > l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, > uint8_t portid, struct lcore_conf *qconf) > { > - int32_t j; > + int32_t i, j, pos; > uint16_t dst_port[MAX_PKT_BURST]; > > /* > @@ -247,6 +252,11 @@ static inline __attribute__((always_inline)) > uint16_t > */ > int32_t n = RTE_ALIGN_FLOOR(nb_rx, 8); > > + for (j = 0; j < 8 && j < nb_rx; j++) { > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > + struct ether_hdr *) + > 1); > + } > + > for (j = 0; j < n; j += 8) { > > uint32_t pkt_type = > @@ -263,6 +273,11 @@ static inline __attribute__((always_inline)) > uint16_t > uint32_t tcp_or_udp = pkt_type & > (RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP); > > + for (i = 0, pos = j + 8; i < 8 && pos < nb_rx; i++, > pos++) { > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[po > s], > + struct > ether_hdr *) + 1); > + } > + > if (tcp_or_udp && (l3_type == RTE_PTYPE_L3_IPV4)) { > > em_get_dst_port_ipv4x8(qconf, > &pkts_burst[j], portid, > diff --git a/examples/l3fwd/l3fwd_em_hlm_neon.h > b/examples/l3fwd/l3fwd_em_hlm_neon.h > new file mode 100644 > index 000..dae1acf > --- /dev/null > +++ b/examples/l3fwd/l3fwd_em_hlm_neon.h > @@ -0,0 +1,74 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * Copyright(c) 2017, Linaro Limited > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or > without > + * modification, are permitted provided that the following > conditions > + * are met: > + * > + * * Redistributions of source code must retain the above > copyright > + * notice, this list of conditions and the following > disclaimer. > + * * Redistributions in binary form must reproduce the above > copyright > + * notice, this list of conditions and the following > disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products > derived > + * from this software without specific prior written > permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES