Re: [dpdk-dev] [PATCH v3 2/2] examples/performance-thread: add arm64 support
This license is already there in many files in examples/performance- thread directory. There are two cases in my patch. 1. I moved some code from examples/performance-thread/common/lthread.c to examples/performance-thread/common/arch/x86/stack.h. lthread.c already has the below kind of license. So I think there is no issue retaining the same in stack.h also. 2. I added the following files. examples/performance-thread/common/arch/arm64/ctx.c examples/performance-thread/common/arch/arm64/ctx.h examples/performance-thread/common/arch/arm64/stack.h These are actually written by me and not taken from the github link. By mistake I copied the license entirely :) For these files, I shall remove it and re-post a v3. Thanks Ashwin On Mon, 2017-07-03 at 21:21 +, O'Driscoll, Tim wrote: > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas > > Monjalon > > > > There can be a licensing issue here. > > We may need advice from the Governing Board and the Technical > > Board. > That's correct. This uses a 2-clause BSD license, but the > Intellectual Property Policy section in the Project Charter (http://d > pdk.org/about/charter#ip) specifies 3-clause BSD. If you really need > to use a new license, then you'll need to make a request to the > Governing Board as specified in clause 6.c in the charter. > > > > > > > 18/05/2017 12:21, Ashwin Sekhar T K: > > > > > > +/* > > > + * https://github.com/halayli/lthread which carries the > > > following > > license. > > > > > > + * > > > + * Copyright (C) 2012, Hasan Alayli > > > + * > > > + * Redistribution and use in source and binary forms, with or > > > without > > > + * modification, are permitted provided that the following > > > conditions > > > + * are met: > > > + * 1. Redistributions of source code must retain the above > > > copyright > > > + *notice, this list of conditions and the following > > > disclaimer. > > > + * 2. Redistributions in binary form must reproduce the above > > copyright > > > > > > + *notice, this list of conditions and the following > > > disclaimer in > > the > > > > > > + *documentation and/or other materials provided with the > > distribution. > > > > > > + * > > > + * THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS > > > IS'' AND > > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > > > TO, > > THE > > > > > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > > > PARTICULAR > > PURPOSE > > > > > > + * ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE > > LIABLE > > > > > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > > CONSEQUENTIAL > > > > > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > > > SUBSTITUTE > > GOODS > > > > > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > > INTERRUPTION) > > > > > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > > CONTRACT, STRICT > > > > > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > > > ARISING IN > > ANY WAY > > > > > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > > POSSIBILITY OF > > > > > > + * SUCH DAMAGE. > > > + */
Re: [dpdk-dev] [PATCH v5 2/4] eal: move gcc version definition to common header
On Mon, 2017-07-03 at 22:51 +0200, Thomas Monjalon wrote: > 12/05/2017 12:15, Ashwin Sekhar T K: > > > > Moved the definition of GCC_VERSION from lib/librte_table/rte_lru.h > > to lib/librte_eal/common/include/rte_common.h. > > > > Tested compilation on: > > * arm64 with gcc > > * x86 with gcc and clang > > > > Signed-off-by: Ashwin Sekhar T K > > Reviewed-by: Jan Viktorin > > --- > > --- a/lib/librte_eal/common/include/rte_common.h > > +++ b/lib/librte_eal/common/include/rte_common.h > > +/** Define GCC_VERSION **/ > > +#ifdef RTE_TOOLCHAIN_GCC > > +#define GCC_VERSION (__GNUC__ * 1 + __GNUC_MINOR__ * 100 + > > \ > > + __GNUC_PATCHLEVEL__) > > +#endif > [...] > > > > --- a/lib/librte_table/rte_lru.h > > +++ b/lib/librte_table/rte_lru.h > > -#ifdef __INTEL_COMPILER > > -#define GCC_VERSION (0) > > -#else > > -#define GCC_VERSION (__GNUC__ * 1+__GNUC_MINOR__*100 + > > __GNUC_PATCHLEVEL__) > > -#endif > The ICC check is lost when moving in rte_common.h. All usage of GCC_VERSION is kept under #ifdef RTE_TOOLCHAIN_GCC. So the ICC check is not required. Ashwin
Re: [dpdk-dev] [PATCH] sched: enable neon optimizations
On Friday 28 April 2017 09:20 AM, Jianbo Liu wrote: > On 27 April 2017 at 21:00, Ashwin Sekhar T K > wrote: >> * Enabled CONFIG_RTE_SCHED_VECTOR for arm64 >> * Verified the changes with sched_autotest unit test case >> >> Signed-off-by: Ashwin Sekhar T K >> --- >> config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +- >> lib/librte_sched/rte_sched.c | 22 ++ >> 2 files changed, 23 insertions(+), 1 deletion(-) >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc >> b/config/defconfig_arm64-armv8a-linuxapp-gcc >> index 65888ce..021044a 100644 >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc >> @@ -48,4 +48,4 @@ CONFIG_RTE_LIBRTE_FM10K_PMD=n >> CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n >> CONFIG_RTE_LIBRTE_AVP_PMD=n >> >> -CONFIG_RTE_SCHED_VECTOR=n >> +CONFIG_RTE_SCHED_VECTOR=y > > It's enough to remove this line only, I don't think you must enable it > explicitly in the armv8a common config. > Tried removing this line from armv8a config. But in that case RTE_SCHED_VECTOR doesn't get defined. ./config/common_base has "CONFIG_RTE_SCHED_VECTOR=n" as the default setting. So enabling explicitly is required. - Ashwin >> diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c >> index 614705d..4ba476a 100644 >> --- a/lib/librte_sched/rte_sched.c >> +++ b/lib/librte_sched/rte_sched.c >> @@ -58,6 +58,8 @@ >> >> #if defined(__SSE4__) >> #define SCHED_VECTOR_SSE4 >> +#elif defined(RTE_MACHINE_CPUFLAG_NEON) >> +#define SCHED_VECTOR_NEON >> #endif >> >> #endif >> @@ -1732,6 +1734,26 @@ grinder_pipe_exists(struct rte_sched_port *port, >> uint32_t base_pipe) >> return 1; >> } >> >> +#elif defined(SCHED_VECTOR_NEON) >> + >> +static inline int >> +grinder_pipe_exists(struct rte_sched_port *port, uint32_t base_pipe) >> +{ >> + uint32x4_t index, pipes; >> + uint32_t *pos = (uint32_t *)port->grinder_base_bmp_pos; >> + >> + index = vmovq_n_u32(base_pipe); >> + pipes = vld1q_u32(pos); >> + if (!vminvq_u32(veorq_u32(pipes, index))) >> + return 1; >> + >> + pipes = vld1q_u32(pos + 4); >> + if (!vminvq_u32(veorq_u32(pipes, index))) >> + return 1; >> + >> + return 0; >> +} >> + >> #else >> >> static inline int >> -- >> 2.7.4 >> >
Re: [dpdk-dev] [PATCH] sched: enable neon optimizations
On Friday 28 April 2017 11:07 AM, Jianbo Liu wrote: > On 28 April 2017 at 13:27, Sekhar, Ashwin wrote: >> On Friday 28 April 2017 09:20 AM, Jianbo Liu wrote: >>> On 27 April 2017 at 21:00, Ashwin Sekhar T K >>> wrote: >>>> * Enabled CONFIG_RTE_SCHED_VECTOR for arm64 >>>> * Verified the changes with sched_autotest unit test case >>>> >>>> Signed-off-by: Ashwin Sekhar T K >>>> --- >>>> config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +- >>>> lib/librte_sched/rte_sched.c | 22 ++ >>>> 2 files changed, 23 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc >>>> b/config/defconfig_arm64-armv8a-linuxapp-gcc >>>> index 65888ce..021044a 100644 >>>> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc >>>> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc >>>> @@ -48,4 +48,4 @@ CONFIG_RTE_LIBRTE_FM10K_PMD=n >>>> CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n >>>> CONFIG_RTE_LIBRTE_AVP_PMD=n >>>> >>>> -CONFIG_RTE_SCHED_VECTOR=n >>>> +CONFIG_RTE_SCHED_VECTOR=y >>> >>> It's enough to remove this line only, I don't think you must enable it >>> explicitly in the armv8a common config. >>> >> Tried removing this line from armv8a config. But in that case >> RTE_SCHED_VECTOR doesn't get defined. >> ./config/common_base has "CONFIG_RTE_SCHED_VECTOR=n" as the default >> setting. So enabling explicitly is required. >> > > I know it must be enabled to use your enhancement. But I meant to keep > the same as common_base (or other default configs) if there is no > other strange reason to enable it. > > Thanks! > Jianbo > Got it. Will update the patch removing CONFIG_RTE_SCHED_VECTOR=n from defconfig_arm64-armv8a-linuxapp-gcc and resend. Thanks Ashwin
Re: [dpdk-dev] [PATCH 1/2] net: add arm64 neon version of CRC compute APIs
Hi Jan, Thanks for the comments. Please see my responses inline. On Friday 28 April 2017 03:25 PM, Jan Viktorin wrote: > Hello Ashwin Sekhar, > > some comments below... > > On Thu, 27 Apr 2017 07:10:20 -0700 > Ashwin Sekhar T K wrote: > >> * Added CRC compute APIs for arm64 utilizing the pmull capability >> * Added new file net_crc_neon.h to hold the arm64 pmull CRC >> implementation >> * Added crypto capability in compilation of generic armv8 and >> thunderx targets >> * pmull CRC version is used only after checking the pmull capability >> at runtime >> * Verified the changes with crc_autotest unit test case >> >> Signed-off-by: Ashwin Sekhar T K >> --- >> MAINTAINERS | 1 + >> lib/librte_eal/common/include/arch/arm/rte_vect.h | 45 +++ >> lib/librte_net/net_crc_neon.h | 357 >> ++ >> lib/librte_net/rte_net_crc.c | 32 +- >> lib/librte_net/rte_net_crc.h | 2 + >> mk/machine/armv8a/rte.vars.mk | 2 +- >> mk/machine/thunderx/rte.vars.mk | 2 +- >> mk/rte.cpuflags.mk| 3 + >> mk/toolchain/gcc/rte.toolchain-compat.mk | 1 + >> 9 files changed, 438 insertions(+), 7 deletions(-) >> create mode 100644 lib/librte_net/net_crc_neon.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 576d60a..283743e 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -149,6 +149,7 @@ F: lib/librte_lpm/rte_lpm_neon.h >> F: lib/librte_hash/rte*_arm64.h >> F: lib/librte_efd/rte*_arm64.h >> F: lib/librte_table/rte*_arm64.h >> +F: lib/librte_net/net_crc_neon.h >> F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c >> F: drivers/net/i40e/i40e_rxtx_vec_neon.c >> F: drivers/net/virtio/virtio_rxtx_simple_neon.c >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h >> b/lib/librte_eal/common/include/arch/arm/rte_vect.h >> index 4107c99..9a3dfdf 100644 >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h >> @@ -34,9 +34,18 @@ >> #define _RTE_VECT_ARM_H_ >> >> #include >> +#include >> + >> #include "generic/rte_vect.h" >> #include "arm_neon.h" >> >> +#ifdef GCC_VERSION >> +#undef GCC_VERSION >> +#endif > > Why are you doing this? What is wrong with GCC_VERSION? > This is just to avoid multiple definitions of GCC_VERSION. Not required really. Can be removed. >> + >> +#define GCC_VERSION (__GNUC__ * 1 + __GNUC_MINOR__ * 100 \ >> ++ __GNUC_PATCHLEVEL__) >> + > > If you have any specific requirements for testing GCC version then it > should be done in a more elegant way. However, I do not understand your > intention. > GCC version is checked so as to define wrappers for some neon intrinsics which are not available in GCC versions < 7. Similar checks of GCC_VERSION done in ./lib/librte_table/rte_lru.h. Followed the same template here. Also, this is the suggested approach by GCC. Please see below link. https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html Please advise on more elegant ways of gcc version detection. >> #ifdef __cplusplus >> extern "C" { >> #endif >> @@ -78,6 +87,42 @@ vqtbl1q_u8(uint8x16_t a, uint8x16_t b) >> } >> #endif >> >> +#if (GCC_VERSION < 7) > > Is this code is gcc-specific? In such case there should be check for > GCC compiler. We can also build e.g. by clang. > Yes, the code is GCC specific. Currently there are only GCC targets for arm and arm64. So no checks are done for other types of compilers. >> +/* >> + * NEON intrinsic vreinterpretq_u64_p128() is not supported >> + * in GCC versions < 7 >> + */ > > I'd be positive about those comments, like: > > NEON intrinsic vreinterpretq_u64_p128() is supported since GCC 7. > Thanks. Will make the comments positive. >> +static inline uint64x2_t >> +vreinterpretq_u64_p128(poly128_t x) >> +{ >> +return (uint64x2_t)x; >> +} >> + >> +/* >> + * NEON intrinsic vreinterpretq_p64_u64() is not supported >> + * in GCC versions < 7 >> + */ >> +static inline poly64x2_t >> +vreinterpretq_p64_u64(uint64x2_t x) >> +{ >> +return (poly64x2_t)x; >> +} >> + >> +/* >> + * NEON intrinsic vgetq_lane_p64() is not supported >> + * in GCC versions < 7 >> + */ >> +static inline poly64_t >> +vgetq_lane_p64(poly64x2_t x, const int lane) >> +{ >> +assert(lane >= 0 && lane <= 1); >> + >> +poly64_t *p = (poly64_t *)&x; >> + >> +return p[lane]; >> +} >> +#endif >> + >> #ifdef __cplusplus >> } >> #endif >> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.h > > [...] > >> # CPU_LDFLAGS = >> # CPU_ASFLAGS = >> >> -MACHINE_CFLAGS += -march=armv8-a+crc >> +MACHINE_CFLAGS += -march=armv8-a+crc+crypto >> diff --git a/mk/machine/thunderx/rte.vars.mk >> b/mk/machine/thunderx/rte.vars.mk >> index ad5a379..6784105 100644 >> --- a/mk/machine/thunderx/rte.vars.mk >> +++ b/mk/machine/thunderx/rte.vars.mk >> @@ -55,4 +
Re: [dpdk-dev] [PATCH v2] efd: support lookup using neon intrinsics
On Friday 28 April 2017 03:36 PM, Jianbo Liu wrote: > On 27 April 2017 at 20:44, Ashwin Sekhar T K > wrote: >> * Added file lib/librte_efd/rte_efd_arm64.h to hold arm64 >> specific definitions >> * Verified the changes with efd_autotest unit test case >> >> Signed-off-by: Ashwin Sekhar T K >> --- >> v2: >> * Slightly modified the content of the commit message body >> * Added prefix [dpdk-dev] to the email subject line >> >> MAINTAINERS| 1 + >> lib/librte_efd/rte_efd.c | 22 >> lib/librte_efd/rte_efd_arm64.h | 76 >> ++ >> 3 files changed, 99 insertions(+) >> create mode 100644 lib/librte_efd/rte_efd_arm64.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index b6495d2..7d708ae 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -147,6 +147,7 @@ F: lib/librte_eal/common/include/arch/arm/*_64.h >> F: lib/librte_acl/acl_run_neon.* >> F: lib/librte_lpm/rte_lpm_neon.h >> F: lib/librte_hash/rte*_arm64.h >> +F: lib/librte_efd/rte*_arm64.h >> F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c >> F: drivers/net/i40e/i40e_rxtx_vec_neon.c >> F: drivers/net/virtio/virtio_rxtx_simple_neon.c >> diff --git a/lib/librte_efd/rte_efd.c b/lib/librte_efd/rte_efd.c >> index f601d62..4d9a088 100644 >> --- a/lib/librte_efd/rte_efd.c >> +++ b/lib/librte_efd/rte_efd.c >> @@ -53,6 +53,8 @@ >> #include "rte_efd.h" >> #if defined(RTE_ARCH_X86) >> #include "rte_efd_x86.h" >> +#elif defined(RTE_ARCH_ARM64) >> +#include "rte_efd_arm64.h" >> #endif >> >> #define EFD_KEY(key_idx, table) (table->keys + ((key_idx) * table->key_len)) >> @@ -103,6 +105,7 @@ allocated memory >> enum efd_lookup_internal_function { >> EFD_LOOKUP_SCALAR = 0, >> EFD_LOOKUP_AVX2, >> + EFD_LOOKUP_NEON, > > Should it be included in "if defined(RTE_ARCH_ARM64)"? > The enum can be wrapped under "if defined(RTE_ARCH_ARM64)" with no issues, as all its usages are also under "if defined(RTE_ARCH_ARM64)". I followed EFD_LOOKUP_AVX2 and defined EFD_LOOKUP_NEON on the same lines. Please advise on whether this change is to be made. Will follow your advice. >> EFD_LOOKUP_NUM >> }; >> >> @@ -674,6 +677,16 @@ rte_efd_create(const char *name, uint32_t >> max_num_rules, uint32_t key_len, >> table->lookup_fn = EFD_LOOKUP_AVX2; >> else >> #endif >> +#if defined(RTE_ARCH_ARM64) >> + /* >> +* For less than or equal to 16 bits, scalar function performs better >> +* than vectorised version >> +*/ >> + if (RTE_EFD_VALUE_NUM_BITS > 16 && >> + rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) >> + table->lookup_fn = EFD_LOOKUP_NEON; >> + else >> +#endif >> table->lookup_fn = EFD_LOOKUP_SCALAR; >> >> /* >> @@ -1271,6 +1284,15 @@ efd_lookup_internal(const struct >> efd_online_group_entry * const group, >> group->lookup_table, >> hash_val_a, >> hash_val_b); >> + break; >> +#endif >> +#if defined(RTE_ARCH_ARM64) >> + case EFD_LOOKUP_NEON: >> + return efd_lookup_internal_neon(group->hash_idx, >> + group->lookup_table, >> + hash_val_a, >> + hash_val_b); >> + break; >> #endif >> case EFD_LOOKUP_SCALAR: >> /* Fall-through */ >> diff --git a/lib/librte_efd/rte_efd_arm64.h b/lib/librte_efd/rte_efd_arm64.h >> new file mode 100644 >> index 000..cc93411 >> --- /dev/null >> +++ b/lib/librte_efd/rte_efd_arm64.h >> @@ -0,0 +1,76 @@ >> +/* >> + * BSD LICENSE >> + * >> + * Copyright (C) Cavium networks Ltd. 2017. >> + * >> + * Redistribution and use in source and binary forms, with or without >> + * modification, are permitted provided that the following conditions >> + * are met: >> + * >> + * * Redistributions of source code must retain the above copyright >> + * notice, this list of conditions and the following disclaimer. >> + * * Redistributions in binary form must reproduce the above copyright >> + * notice, this list of conditions and the following disclaimer in >> + * the documentation and/or other materials provided with the >> + * distribution. >> + * * Neither the name of Cavium networks nor the names of its >> + * contributors may be used to endorse or promote products derived >> + * from this software without specific prior written permission. >> + * >> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS >> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR >> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT >> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT
Re: [dpdk-dev] [PATCH v3] efd: support lookup using neon intrinsics
On Tue, 2017-05-02 at 15:59 +0800, Jianbo Liu wrote: > On 2 May 2017 at 14:41, Jerin Jacob > wrote: > > > > -Original Message- > > > > > > Date: Mon, 1 May 2017 22:59:53 -0700 > > > From: Ashwin Sekhar T K > > > To: byron.mar...@intel.com, pablo.de.lara.gua...@intel.com, > > > jerin.ja...@caviumnetworks.com, jianbo@linaro.org > > > Cc: dev@dpdk.org, Ashwin Sekhar T K > > .com> > > > Subject: [dpdk-dev] [PATCH v3] efd: support lookup using neon > > > intrinsics > > > X-Mailer: git-send-email 2.13.0.rc1 > > > > > > * Added file lib/librte_efd/rte_efd_arm64.h to hold arm64 > > > specific definitions > > > * Verified the changes with efd_autotest unit test case > > > > > > Signed-off-by: Ashwin Sekhar T K > > m> > > > --- > > > v2: > > > * Slightly modified the content of the commit message body > > > * Added prefix [dpdk-dev] to the email subject line > > > > > > v3: > > > * Moved enum 'EFD_LOOKUP_NEON' under '#if > > > defined(RTE_ARCH_ARM64)' > > > > > > MAINTAINERS| 1 + > > > lib/librte_efd/rte_efd.c | 24 + > > > lib/librte_efd/rte_efd_arm64.h | 76 > > > ++ > > > 3 files changed, 101 insertions(+) > > > create mode 100644 lib/librte_efd/rte_efd_arm64.h > > > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index b6495d2..7d708ae 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -147,6 +147,7 @@ F: > > > lib/librte_eal/common/include/arch/arm/*_64.h > > > F: lib/librte_acl/acl_run_neon.* > > > F: lib/librte_lpm/rte_lpm_neon.h > > > F: lib/librte_hash/rte*_arm64.h > > > +F: lib/librte_efd/rte*_arm64.h > > > F: drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > > > F: drivers/net/i40e/i40e_rxtx_vec_neon.c > > > F: drivers/net/virtio/virtio_rxtx_simple_neon.c > > > diff --git a/lib/librte_efd/rte_efd.c b/lib/librte_efd/rte_efd.c > > > index f601d62..5cc6283 100644 > > > --- a/lib/librte_efd/rte_efd.c > > > +++ b/lib/librte_efd/rte_efd.c > > > @@ -53,6 +53,8 @@ > > > #include "rte_efd.h" > > > #if defined(RTE_ARCH_X86) > > > #include "rte_efd_x86.h" > > > +#elif defined(RTE_ARCH_ARM64) > > > +#include "rte_efd_arm64.h" > > > #endif > > > > > > #define EFD_KEY(key_idx, table) (table->keys + ((key_idx) * > > > table->key_len)) > > > @@ -103,6 +105,9 @@ allocated memory > > > enum efd_lookup_internal_function { > > > EFD_LOOKUP_SCALAR = 0, > > > EFD_LOOKUP_AVX2, > > > +#if defined(RTE_ARCH_ARM64) > > > + EFD_LOOKUP_NEON, > > > +#endif > > I think, we can remove this ifdef to > > - Make code looks clean > > - In future, in some case a new enum value gets added then the > > value > > will be different for each build. > > > But the enum items are same for each ARCH. > Besides, the ifdef could be considered as explanation to that enum. > If > someone knows nothing about arm/neon, he can ignore it totally after > see the ifdef. > Have added the #if defined on your advice, but in my opinion also its better not to have "#if defined" for enums. Because the same enum can take different values for different builds. For eg: If somebody adds an EFD_LOOKUP_AVX512 after EFD_LOOKUP_NEON here, it will take value 2 for x86 builds but value 3 for arm64 builds. > > > > Any valid point to keep under RTE_ARCH_ARM64? > > > > > > > > EFD_LOOKUP_NUM > > > };
Re: [dpdk-dev] [PATCH 2/5] examples/l3fwd: rename l3fwd_em_sse.h to l3fwd_em_single.h
On Tue, 2017-05-02 at 15:14 +0800, Jianbo Liu wrote: > The l3fwd_em_sse.h is enabled by NO_HASH_LOOKUP_MULTI. > Renaming it because it's only for single hash lookup, > and doesn't include any x86 SSE instructions. > > Signed-off-by: Jianbo Liu > --- > examples/l3fwd/l3fwd_em.c| 2 +- > examples/l3fwd/{l3fwd_em_sse.h => l3fwd_em_single.h} | 0 > 2 files changed, 1 insertion(+), 1 deletion(-) > rename examples/l3fwd/{l3fwd_em_sse.h => l3fwd_em_single.h} (100%) > > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c > index 939a16d..cccf797 100644 > --- a/examples/l3fwd/l3fwd_em.c > +++ b/examples/l3fwd/l3fwd_em.c > @@ -330,7 +330,7 @@ struct ipv6_l3fwd_em_route { > > #if defined(__SSE4_1__) > #if defined(NO_HASH_MULTI_LOOKUP) > -#include "l3fwd_em_sse.h" > +#include "l3fwd_em_single.h" > #else > #include "l3fwd_em_hlm.h" > #endif > diff --git a/examples/l3fwd/l3fwd_em_sse.h > b/examples/l3fwd/l3fwd_em_single.h > similarity index 100% > rename from examples/l3fwd/l3fwd_em_sse.h > rename to examples/l3fwd/l3fwd_em_single.h Shouldn't the guard __L3FWD_EM_SSE_H__ be update to __L3FWD_EM_SINGLE_H__ to maintain consistency ? Thanks and Regards, Ashwin
Re: [dpdk-dev] [PATCH 5/5] examples/l3fwd: add neon support for l3fwd
Hi, Please find comments inline. On Tue, 2017-05-02 at 15:14 +0800, Jianbo Liu wrote: > Use ARM NEON intrinsics to accelerate l3 fowarding. > > Signed-off-by: Jianbo Liu > --- > examples/l3fwd/l3fwd.h | 4 - > examples/l3fwd/l3fwd_em.c | 4 +- > examples/l3fwd/l3fwd_em_hlm.h | 5 + > examples/l3fwd/l3fwd_em_hlm_neon.h | 74 +++ > examples/l3fwd/l3fwd_em_single.h | 4 + > examples/l3fwd/l3fwd_lpm.c | 4 +- > examples/l3fwd/l3fwd_lpm_neon.h| 157 ++ > examples/l3fwd/l3fwd_neon.h| 259 > + > 8 files changed, 504 insertions(+), 7 deletions(-) > create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h > create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h > create mode 100644 examples/l3fwd/l3fwd_neon.h > > diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h > index 011ba14..c45589a 100644 > --- a/examples/l3fwd/l3fwd.h > +++ b/examples/l3fwd/l3fwd.h > @@ -40,10 +40,6 @@ > > #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1 > > -#if !defined(NO_HASH_MULTI_LOOKUP) && > defined(RTE_MACHINE_CPUFLAG_NEON) > -#define NO_HASH_MULTI_LOOKUP 1 > -#endif > - > #define MAX_PKT_BURST 32 > #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */ > > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c > index cccf797..ac1e2e0 100644 > --- a/examples/l3fwd/l3fwd_em.c > +++ b/examples/l3fwd/l3fwd_em.c > @@ -328,7 +328,7 @@ struct ipv6_l3fwd_em_route { > return (uint8_t)((ret < 0) ? portid : > ipv6_l3fwd_out_if[ret]); > } > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > #if defined(NO_HASH_MULTI_LOOKUP) > #include "l3fwd_em_single.h" > #else > @@ -709,7 +709,7 @@ struct ipv6_l3fwd_em_route { > if (nb_rx == 0) > continue; > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > l3fwd_em_send_packets(nb_rx, pkts_burst, > portid, > qconf); > #else > diff --git a/examples/l3fwd/l3fwd_em_hlm.h > b/examples/l3fwd/l3fwd_em_hlm.h > index 636dea4..3329c1a 100644 > --- a/examples/l3fwd/l3fwd_em_hlm.h > +++ b/examples/l3fwd/l3fwd_em_hlm.h > @@ -35,8 +35,13 @@ > #ifndef __L3FWD_EM_HLM_H__ > #define __L3FWD_EM_HLM_H__ > > +#if defined(__SSE4_1__) > #include "l3fwd_sse.h" > #include "l3fwd_em_hlm_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > +#include "l3fwd_neon.h" > +#include "l3fwd_em_hlm_neon.h" > +#endif > > static inline __attribute__((always_inline)) void > em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf > *m[8], > diff --git a/examples/l3fwd/l3fwd_em_hlm_neon.h > b/examples/l3fwd/l3fwd_em_hlm_neon.h > new file mode 100644 > index 000..dae1acf > --- /dev/null > +++ b/examples/l3fwd/l3fwd_em_hlm_neon.h > @@ -0,0 +1,74 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * Copyright(c) 2017, Linaro Limited > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or > without > + * modification, are permitted provided that the following > conditions > + * are met: > + * > + * * Redistributions of source code must retain the above > copyright > + * notice, this list of conditions and the following > disclaimer. > + * * Redistributions in binary form must reproduce the above > copyright > + * notice, this list of conditions and the following > disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products > derived > + * from this software without specific prior written > permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS > OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > + */ > + > +#ifndef __L3FWD_EM_HLM_NEON_H__ > +#define __L3FWD_EM_HLM_NEON_H__ > + > +#include > + > +static inline void > +get_ipv4_5tuple(struct rte_mbuf *m0, int32x4_t mask0, >
Re: [dpdk-dev] [PATCH 5/5] examples/l3fwd: add neon support for l3fwd
Hi Jianbo, I tested your neon changes on thunderx. I am seeing a performance regression of ~10% for LPM case and ~20% for EM case with your changes. Did you see improvement on any arm64 platform with these changes. If yes, how much was the improvement? FYI, I had also tried vectorizing the l3fwd app with neon. Few of the optimizations that I can suggest that helped in my case. * Packet data prefetch is missing in the x86 sse version compared to the scalar version (l3fwd_lpm_send_packets vs l3fwd_lpm_no_opt_send_packets) . I couldn't understand why this was not done in x86. But adding the prefetch was improving performance for thunderx. * Offsets to some packet elements like eth_hdr, ip header, packet type etc. are recalculated in different functions. Calculating them once, caching them and passing them directly to different functions was improving performance. * There are 3 different loops in l3fwd_lpm_send_packets where we iterate over the packets. One each for processx4_step1 and processx4_step2 and one in send_packets_multi. Unifying these loops were also helping. Thanks and Regards Ashwin
Re: [dpdk-dev] [PATCH 5/5] examples/l3fwd: add neon support for l3fwd
On Thu, 2017-05-04 at 16:42 +0800, Jianbo Liu wrote: > Hi Ashwin, > > On 3 May 2017 at 13:24, Jianbo Liu wrote: > > > > Hi Ashwin, > > > > On 2 May 2017 at 19:47, Sekhar, Ashwin > > wrote: > > > > > > Hi Jianbo, > > > > > > I tested your neon changes on thunderx. I am seeing a performance > > > regression of ~10% for LPM case and ~20% for EM case with your > > > changes. > > > Did you see improvement on any arm64 platform with these changes. > > > If > > > yes, how much was the improvement? > > Thanks for your reviewing and testing. > > For some reason, I have not done much with the performance testing. > > I'll send a new version later after tuning the performance. > > > Can you tell me how did you test? Built with following commands. make config T=arm64-thunderx-linuxapp-gcc make -j32 Tested LPM with sudo ./examples/l3fwd/build/l3fwd -l 9,10 --master-lcore 9 -- -p 0x1 --config="(0,0,10)" Tested EM with sudo ./examples/l3fwd/build/l3fwd -l 9,10 --master-lcore 9 -- -p 0x1 --config="(0,0,10)" -E > My testing shows that EM case is much better, while LPM is almost the > same as before. Could you please tell on which arm64 processor/platform you tested. Also how much was the percentage increase in performance for EM ? > Thanks! > Jianbo
Re: [dpdk-dev] [PATCH 5/5] examples/l3fwd: add neon support for l3fwd
On Fri, 2017-05-05 at 13:43 +0800, Jianbo Liu wrote: > On 5 May 2017 at 12:24, Sekhar, Ashwin > wrote: > > > > On Thu, 2017-05-04 at 16:42 +0800, Jianbo Liu wrote: > > > > > > Hi Ashwin, > > > > > > On 3 May 2017 at 13:24, Jianbo Liu wrote: > > > > > > > > > > > > Hi Ashwin, > > > > > > > > On 2 May 2017 at 19:47, Sekhar, Ashwin > > > m> > > > > wrote: > > > > > > > > > > > > > > > Hi Jianbo, > > > > > > > > > > I tested your neon changes on thunderx. I am seeing a > > > > > performance > > > > > regression of ~10% for LPM case and ~20% for EM case with > > > > > your > > > > > changes. > > > > > Did you see improvement on any arm64 platform with these > > > > > changes. > > > > > If > > > > > yes, how much was the improvement? > > > > Thanks for your reviewing and testing. > > > > For some reason, I have not done much with the performance > > > > testing. > > > > I'll send a new version later after tuning the performance. > > > > > > > Can you tell me how did you test? > > Built with following commands. > > make config T=arm64-thunderx-linuxapp-gcc > > make -j32 > > > > Tested LPM with > > sudo ./examples/l3fwd/build/l3fwd -l 9,10 --master-lcore 9 -- -p > > 0x1 --config="(0,0,10)" > > > > Tested EM with > > sudo ./examples/l3fwd/build/l3fwd -l 9,10 --master-lcore 9 -- -p > > 0x1 --config="(0,0,10)" -E > > > Only one port? What's the network topology, and lpm/em rules? How did > you stress traffic...? port - 1 topology: DUT connected back to back to traffic generator. We are using the default rules in the C code. flow generation is: src.ip.min 192.168.18.1 src.ip.max 192.168.18.90 src.ip.inc 1 Also, Please let us know the topology that you are using. > > > > > > > > > My testing shows that EM case is much better, while LPM is almost > > > the > > > same as before. > > Could you please tell on which arm64 processor/platform you tested. > > Also how much was the percentage increase in performance for EM ? > > > I'm sorry I can't tell you what's arm64 platform I tested on. But I > can get a ThunderX, and replicate your testing environment if you can > tell me more... Thanks. > > Thanks! > Jianbo
Re: [dpdk-dev] [PATCH v2 7/7] examples/l3fwd: change the guard micro name for header file
In commit message: s/micro/macro/ On Wed, 2017-05-10 at 10:30 +0800, Jianbo Liu wrote: > As l3fwd_em_sse.h is renamed to l3fwd_em_sequential.h, change the > macro > to __L3FWD_EM_SEQUENTIAL_H__ to maintain consistency. > > Signed-off-by: Jianbo Liu > --- > examples/l3fwd/l3fwd_em_sequential.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/examples/l3fwd/l3fwd_em_sequential.h > b/examples/l3fwd/l3fwd_em_sequential.h > index c3df473..63c5c12 100644 > --- a/examples/l3fwd/l3fwd_em_sequential.h > +++ b/examples/l3fwd/l3fwd_em_sequential.h > @@ -31,8 +31,8 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > */ > > -#ifndef __L3FWD_EM_SSE_H__ > -#define __L3FWD_EM_SSE_H__ > +#ifndef __L3FWD_EM_SEQUENTIAL_H__ > +#define __L3FWD_EM_SEQUENTIAL_H__ > > /** > * @file > @@ -125,4 +125,4 @@ static inline __attribute__((always_inline)) > uint16_t > > send_packets_multi(qconf, pkts_burst, dst_port, nb_rx); > } > -#endif /* __L3FWD_EM_SSE_H__ */ > +#endif /* __L3FWD_EM_SEQUENTIAL_H__ */
Re: [dpdk-dev] [PATCH v2 5/7] examples/l3fwd: add neon support for l3fwd
Hi Jianbo, Thanks for version v2. Addition of the prefetch instructions is definitely helping performance on ThunderX. But still performance is slightly less than that of scalar. I tried few small tweaks which helped improve performance on my Thunderx setup. For details see comments inline. On Wed, 2017-05-10 at 10:30 +0800, Jianbo Liu wrote: > Use ARM NEON intrinsics to accelerate l3 fowarding. > > Signed-off-by: Jianbo Liu > --- > examples/l3fwd/l3fwd_em.c| 4 +- > examples/l3fwd/l3fwd_em_hlm.h| 19 ++- > examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ > examples/l3fwd/l3fwd_em_sequential.h | 20 ++- > examples/l3fwd/l3fwd_lpm.c | 4 +- > examples/l3fwd/l3fwd_lpm_neon.h | 165 ++ > examples/l3fwd/l3fwd_neon.h | 259 > +++ > 7 files changed, 539 insertions(+), 6 deletions(-) > create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h > create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h > create mode 100644 examples/l3fwd/l3fwd_neon.h > > [...] > diff --git a/examples/l3fwd/l3fwd_em_hlm.h > b/examples/l3fwd/l3fwd_em_hlm.h > index 636dea4..4ec600a 100644 > --- a/examples/l3fwd/l3fwd_em_hlm.h > +++ b/examples/l3fwd/l3fwd_em_hlm.h > @@ -35,8 +35,13 @@ > #ifndef __L3FWD_EM_HLM_H__ > #define __L3FWD_EM_HLM_H__ > > +#if defined(__SSE4_1__) > #include "l3fwd_sse.h" > #include "l3fwd_em_hlm_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > +#include "l3fwd_neon.h" > +#include "l3fwd_em_hlm_neon.h" > +#endif > > static inline __attribute__((always_inline)) void > em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf > *m[8], > @@ -238,7 +243,7 @@ static inline __attribute__((always_inline)) > uint16_t > l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, > uint8_t portid, struct lcore_conf *qconf) > { > - int32_t j; > + int32_t i, j, pos; > uint16_t dst_port[MAX_PKT_BURST]; > > /* > @@ -247,6 +252,12 @@ static inline __attribute__((always_inline)) > uint16_t > */ > int32_t n = RTE_ALIGN_FLOOR(nb_rx, 8); > > + for (j = 0; j < 8 && j < nb_rx; j++) { > + rte_prefetch0(pkts_burst[j]); The above prefetch of rte_mbuf struct is unnecessary. With this we wont see any performance improvement as the contents of rte_mbuf (buf_addr and data_off) is used in right next instruction. Removing the above prefetch and similar prefetches at multiple places was improving performance on my ThunderX setup. > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > + struct ether_hdr *) + > 1); Better to prefetch at eth_hdr itself and not at eth_hdr + 1. In process_packet in l3fwd_neon.h, eth_header is accessed. > + } > + > for (j = 0; j < n; j += 8) { > > uint32_t pkt_type = > @@ -263,6 +274,12 @@ static inline __attribute__((always_inline)) > uint16_t > uint32_t tcp_or_udp = pkt_type & > (RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP); > > + for (i = 0, pos = j + 8; i < 8 && pos < nb_rx; i++, > pos++) { > + rte_prefetch0(pkts_burst[pos]); The above prefetch of rte_mbuf struct is unnecessary. > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[po > s], > + struct > ether_hdr *) + 1); Better to prefetch at eth_hdr itself and not at eth_hdr + 1 > + } > + > if (tcp_or_udp && (l3_type == RTE_PTYPE_L3_IPV4)) { > > em_get_dst_port_ipv4x8(qconf, > &pkts_burst[j], portid, > > [...] > diff --git a/examples/l3fwd/l3fwd_em_sequential.h > b/examples/l3fwd/l3fwd_em_sequential.h > index c0a9725..c3df473 100644 > --- a/examples/l3fwd/l3fwd_em_sequential.h > +++ b/examples/l3fwd/l3fwd_em_sequential.h > @@ -43,7 +43,11 @@ > * compilation time. > */ > > +#if defined(__SSE4_1__) > #include "l3fwd_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > +#include "l3fwd_neon.h" > +#endif > > static inline __attribute__((always_inline)) uint16_t > em_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf > *pkt, > @@ -101,11 +105,23 @@ static inline __attribute__((always_inline)) > uint16_t > l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, > uint8_t portid, struct lcore_conf *qconf) > { > - int32_t j; > + int32_t i, j; > uint16_t dst_port[MAX_PKT_BURST]; > > - for (j = 0; j < nb_rx; j++) > + if (nb_rx > 0) { > + rte_prefetch0(pkts_burst[0]); The above prefetch of rte_mbuf struct is unnecessary. > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[0], > + struct ether_hdr *) + > 1); Better to prefetch at eth_hdr itself and not at eth_hdr + 1 > + } > + > + for (i = 1, j = 0; j < nb_rx; i++, j++) { > + if (i < nb_rx) { > +
Re: [dpdk-dev] [PATCH v2 5/7] examples/l3fwd: add neon support for l3fwd
On Thu, 2017-05-11 at 11:16 +0800, Jianbo Liu wrote: > Hi Ashwin, > > On 10 May 2017 at 23:00, Sekhar, Ashwin > wrote: > > > > Hi Jianbo, > > > > Thanks for version v2. Addition of the prefetch instructions is > > definitely helping performance on ThunderX. But still performance > > is > > slightly less than that of scalar. > > > > I tried few small tweaks which helped improve performance on my > > Thunderx setup. For details see comments inline. > > > > > > On Wed, 2017-05-10 at 10:30 +0800, Jianbo Liu wrote: > > > > > > Use ARM NEON intrinsics to accelerate l3 fowarding. > > > > > > Signed-off-by: Jianbo Liu > > > --- > > > examples/l3fwd/l3fwd_em.c| 4 +- > > > examples/l3fwd/l3fwd_em_hlm.h| 19 ++- > > > examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ > > > examples/l3fwd/l3fwd_em_sequential.h | 20 ++- > > > examples/l3fwd/l3fwd_lpm.c | 4 +- > > > examples/l3fwd/l3fwd_lpm_neon.h | 165 > > > ++ > > > examples/l3fwd/l3fwd_neon.h | 259 > > > +++ > > > 7 files changed, 539 insertions(+), 6 deletions(-) > > > create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h > > > create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h > > > create mode 100644 examples/l3fwd/l3fwd_neon.h > > > > > > [...] > > > diff --git a/examples/l3fwd/l3fwd_em_hlm.h > > > b/examples/l3fwd/l3fwd_em_hlm.h > > > index 636dea4..4ec600a 100644 > > > --- a/examples/l3fwd/l3fwd_em_hlm.h > > > +++ b/examples/l3fwd/l3fwd_em_hlm.h > > > @@ -35,8 +35,13 @@ > > > #ifndef __L3FWD_EM_HLM_H__ > > > #define __L3FWD_EM_HLM_H__ > > > > > > +#if defined(__SSE4_1__) > > > #include "l3fwd_sse.h" > > > #include "l3fwd_em_hlm_sse.h" > > > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > > > +#include "l3fwd_neon.h" > > > +#include "l3fwd_em_hlm_neon.h" > > > +#endif > > > > > > static inline __attribute__((always_inline)) void > > > em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf > > > *m[8], > > > @@ -238,7 +243,7 @@ static inline __attribute__((always_inline)) > > > uint16_t > > > l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, > > > uint8_t portid, struct lcore_conf *qconf) > > > { > > > - int32_t j; > > > + int32_t i, j, pos; > > > uint16_t dst_port[MAX_PKT_BURST]; > > > > > > /* > > > @@ -247,6 +252,12 @@ static inline __attribute__((always_inline)) > > > uint16_t > > > */ > > > int32_t n = RTE_ALIGN_FLOOR(nb_rx, 8); > > > > > > + for (j = 0; j < 8 && j < nb_rx; j++) { > > > + rte_prefetch0(pkts_burst[j]); > > The above prefetch of rte_mbuf struct is unnecessary. With this we > > wont > > see any performance improvement as the contents of rte_mbuf > > (buf_addr > > and data_off) is used in right next instruction. Removing the above > > prefetch and similar prefetches at multiple places was improving > > performance on my ThunderX setup. > Yes, will remove them. > > > > > > > > > > > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > > > +struct ether_hdr *) > > > + > > > 1); > > Better to prefetch at eth_hdr itself and not at eth_hdr + 1. In > > process_packet in l3fwd_neon.h, eth_header is accessed in > > > But ip headers are used right in each 8/FWDSTEP loop. > Since ip headers are accessed first, we should prefetch eth_hdr + 1 > first. > After all nb_rx packets are handled in above small loop, their > eth_header are then accessed in processx4_step3 over again. > I'm not sure prefretching eth_hdr still works if we prefetch eth_hdr > in first step, as cache may be already filled with new data at that > time. > Okay. Also, I guess if the ethernet header and ip header falls in the same cache line (which I think would be the case mostly as I hope the packet data will be cache aligned), it doesn't make much of a difference whether you prefetch at ethernet header address or ip header address. > > > > > > > > + } > > > + > > > for (j = 0; j < n; j += 8) { > > > > > > uint32_t pkt_t
Re: [dpdk-dev] [PATCH v2 5/7] examples/l3fwd: add neon support for l3fwd
On Thu, 2017-05-11 at 04:14 +, Sekhar, Ashwin wrote: ... > > > Combining all the above comments, I made some changes on top of > > > your > > > patch. These changes are giving 3-4% improvement over your > > > version. > > > > > > You may find the changes at > > > https://gist.github.com/ashwinyes/34cbdd999784402c859c71613587faf > > > c > > > > > Is the correct in Line 103/104, you only process one packets in the > > last FWDSTEP packets? > Its doing processx4_* there. So its processing 4 packets. > > > > > Actually, I don't like your change in l3fwd_lpm_send_packets, > > making > > the simple logic complicated. And I don't think it can help to > > improve > > performance. :-) > Its not making it complicated. The number of lines of code may be > higher by may be 10 lines, but the conditions of the loops are > simplified which reduces the number of branch instructions and helps > the processor to go through them faster. > > If possible, please try it out on your machine. Missed out one point. Since 2 loops are form "for (i = 0; i < FWDSTEP; i++)" i.e. looping for constant number of iterations, compiler will easily unroll them. Thanks Ashwin > > > > > > > > > > > > > Please check it out and let me know your comments. > > > > > > Thanks > > > Ashwin
Re: [dpdk-dev] [PATCH 2/6] config: add clang support for armv8a linuxapp
On Thu, 2017-05-11 at 10:54 +0530, Jerin Jacob wrote: > -Original Message- > > > > Date: Wed, 10 May 2017 03:16:39 -0700 > > From: Ashwin Sekhar T K > > To: tho...@monjalon.net, jerin.ja...@caviumnetworks.com, > > maciej.cze...@caviumnetworks.com, vikto...@rehivetech.com, > > jianbo@linaro.org, bruce.richard...@intel.com, > > pablo.de.lara.gua...@intel.com, konstantin.anan...@intel.com > > Cc: dev@dpdk.org, Ashwin Sekhar T K > om> > > Subject: [dpdk-dev] [PATCH 2/6] config: add clang support for > > armv8a > > linuxapp > > X-Mailer: git-send-email 2.13.0.rc1 > > > > Added new config arm64-armv8a-linuxapp-clang > > > > Signed-off-by: Ashwin Sekhar T K > > --- > > config/defconfig_arm64-armv8a-linuxapp-clang | 56 > > > > 1 file changed, 56 insertions(+) > > create mode 100644 config/defconfig_arm64-armv8a-linuxapp-clang > > > > diff --git a/config/defconfig_arm64-armv8a-linuxapp-clang > > b/config/defconfig_arm64-armv8a-linuxapp-clang > > +#include "common_linuxapp" > > + > > +CONFIG_RTE_MACHINE="armv8a" > > + > > +CONFIG_RTE_ARCH="arm64" > > +CONFIG_RTE_ARCH_ARM64=y > > +CONFIG_RTE_ARCH_64=y > > + > > +CONFIG_RTE_FORCE_INTRINSICS=y > > + > > +CONFIG_RTE_TOOLCHAIN="clang" > > +CONFIG_RTE_TOOLCHAIN_CLANG=y > > + > > +# Maximum available cache line size in arm64 implementations. > > +# Setting to maximum available cache line size in generic config > > +# to address minimum DMA alignment across all arm64 > > implementations. > > +CONFIG_RTE_CACHE_LINE_SIZE=128 > > + > > +CONFIG_RTE_EAL_IGB_UIO=n > > + > > +CONFIG_RTE_LIBRTE_FM10K_PMD=n > > +CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n > > +CONFIG_RTE_LIBRTE_AVP_PMD=n > > + > > +CONFIG_RTE_SCHED_VECTOR=n > IMO, It is better to create common_armv8 config and let gcc and clang > use that to avoid duplicating the symbols. > For x86, this is the convention that is followed. There are separate defconfigs for icc, gcc, clang with symbols duplicated. Do we need to deviate from this convention for armv8a? > >
Re: [dpdk-dev] [PATCH v3 5/7] examples/l3fwd: add neon support for l3fwd
Hi Jianbo, Thanks for v3. Small compilation error. See inline comment. Otherwise it looks fine. On Thu, 2017-05-11 at 17:25 +0800, Jianbo Liu wrote: > Use ARM NEON intrinsics to accelerate l3 fowarding. > > Signed-off-by: Jianbo Liu > --- [...] > +/** > + * Process one packet: > + * Update source and destination MAC addresses in the ethernet > header. > + * Perform RFC1812 checks and updates for IPV4 packets. > + */ > +static inline void > +process_packet(struct rte_mbuf *pkt, uint16_t *dst_port) > +{ > + struct ether_hdr *eth_hdr; > + uint32x4_t te, ve; > + > + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > + > + te = vld1q_u32((uint32_t *)eth_hdr); > + ve = vreinterpretq_u32_s32(val_eth[dst_port[0]]); > + > + > + rfc1812_process((struct ipv4_hdr *)(eth_hdr + 1), dst_port, > + pkt->packet_type); > + > + ve = vcopyq_lane_u32(ve, 3, te, 3); Compilation error here. This should be vcopyq_laneq_u32 (Extra q after lane) > + vst1q_u32((uint32_t *)eth_hdr, ve); > +} > + [...]
Re: [dpdk-dev] [PATCH v3 5/7] examples/l3fwd: add neon support for l3fwd
On Thu, 2017-05-11 at 18:01 +0800, Jianbo Liu wrote: > On 11 May 2017 at 17:49, Sekhar, Ashwin > wrote: > > > > Hi Jianbo, > > > > Thanks for v3. Small compilation error. See inline comment. > > Otherwise > > it looks fine. > > > > On Thu, 2017-05-11 at 17:25 +0800, Jianbo Liu wrote: > > > > > > Use ARM NEON intrinsics to accelerate l3 fowarding. > > > > > > Signed-off-by: Jianbo Liu > > > --- > > [...] > > > > > > > > +/** > > > + * Process one packet: > > > + * Update source and destination MAC addresses in the ethernet > > > header. > > > + * Perform RFC1812 checks and updates for IPV4 packets. > > > + */ > > > +static inline void > > > +process_packet(struct rte_mbuf *pkt, uint16_t *dst_port) > > > +{ > > > + struct ether_hdr *eth_hdr; > > > + uint32x4_t te, ve; > > > + > > > + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > > > + > > > + te = vld1q_u32((uint32_t *)eth_hdr); > > > + ve = vreinterpretq_u32_s32(val_eth[dst_port[0]]); > > > + > > > + > > > + rfc1812_process((struct ipv4_hdr *)(eth_hdr + 1), dst_port, > > > + pkt->packet_type); > > > + > > > + ve = vcopyq_lane_u32(ve, 3, te, 3); > > Compilation error here. This should be vcopyq_laneq_u32 (Extra q > > after > > lane) > No vcopyq_laneq_u32 in arm_neon.h of my environment. I thought it's a > typo so I changed. > > my gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC). > What about yours? > I am using GCC 7.1. No error with this version. Also to cross check I tried the following versions as well which all gave compilation errors. * gcc (Ubuntu/Linaro 4.9.2-10ubuntu13) 4.9.2 * gcc 5.3 * GCC 6.3 So looks like vcopyq_laneq_u32 is not supported in GCC versions < 7. We can add a wrapper for the same in ./lib/librte_eal/common/include/arch/arm/rte_vect.h for gcc versions < 7. But I think we can defer this activity. Because I have some other patches, which moves around the definition of GCC_VERSION, and adds wrappers for some unsupported instrinsics. Please see below. http://dpdk.org/dev/patchwork/patch/24161/ http://dpdk.org/dev/patchwork/patch/24162/ I think we can add the vcopyq_laneq_u32 change and the wrapper for the same after the above patches are merged. And FYI - Documentation for the vcopyq_laneq_u32 can be found in below document. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0 073A_arm_neon_intrinsics_ref.pdf > > > > > > > > + vst1q_u32((uint32_t *)eth_hdr, ve); > > > +} > > > + > > [...]
Re: [dpdk-dev] [PATCH 0/6] add clang compilation support for armv8a linuxapp
The warning comes only when CFLAGS "-g -ggdb" are given and this seems to be an issue with clang. I am seeing some related bugs on llvm mailing list. https://www.mail-archive.com/llvm-bugs@lists.llvm.org/msg05498.html http://lists.llvm.org/pipermail/llvm-bugs/2016-July/048288.html Even a simple c program with a TLS variable creates this warning. For eg: - test.c --- __thread int a; int main() { return 0; } $ gcc -g -ggdb test.c $ clang -g -ggdb test.c /usr/bin/ld: /tmp/test-50ceac.o(.debug_info+0x37): R_AARCH64_ABS64 used with TLS symbol a $ Thanks Ashwin On Thu, 2017-05-11 at 11:29 +0530, Jerin Jacob wrote: > -Original Message- > > > > Date: Wed, 10 May 2017 03:16:37 -0700 > > From: Ashwin Sekhar T K > > To: tho...@monjalon.net, jerin.ja...@caviumnetworks.com, > > maciej.cze...@caviumnetworks.com, vikto...@rehivetech.com, > > jianbo@linaro.org, bruce.richard...@intel.com, > > pablo.de.lara.gua...@intel.com, konstantin.anan...@intel.com > > Cc: dev@dpdk.org, Ashwin Sekhar T K > om> > > Subject: [dpdk-dev] [PATCH 0/6] add clang compilation support for > > armv8a > > linuxapp > > X-Mailer: git-send-email 2.13.0.rc1 > > > > This series of patches adds the clang compilation support for > > armv8a linuxapp. > > > > Patch 1 is basically for removing the usage of assembly directive > > ".arch armv8-a+crc" > > as this is not understood by clang. For removing these directives, > > compilation of > > armv8a crc32 support is made conditional and is only done for > > machines which has > > the crc extensions. Doing this avoids the need for having the > > ".arch armv8-a+crc" > > directives in the code. > > > > Patch 2 adds the arm64-armv8a-linuxapp-clang defconfig. > > > > Patch 3, 4, 5 and 6 are for fixing the compilation errors/warnings. > There is warning on LD with clang. Could you please check it? > > INSTALL-MAP dpdk-pdump.map > LD testpmd > /usr/bin/ld: build/lib/librte_eal.a(eal_thread.o)(.debug_info+0x37): > R_AARCH64_ABS64 used with TLS symbol per_lcore__lcore_id > /usr/bin/ld: build/lib/librte_eal.a(eal_thread.o)(.debug_info+0x54): > R_AARCH64_ABS64 used with TLS symbol per_lcore__socket_id > /usr/bin/ld: build/lib/librte_eal.a(eal_thread.o)(.debug_info+0x6a): > R_AARCH64_ABS64 used with TLS symbol per_lcore__cpuset > /usr/bin/ld: build/lib/librte_eal.a(eal_thread.o)(.debug_info+0xd2): > R_AARCH64_ABS64 used with TLS symbol rte_gettid.per_lcore__thread_id > /usr/bin/ld: > build/lib/librte_eal.a(eal_interrupts.o)(.debug_info+0x38e): > R_AARCH64_ABS64 used with TLS symbol per_lcore__epfd > /usr/bin/ld: > build/lib/librte_eal.a(eal_common_errno.o)(.debug_info+0x50): > R_AARCH64_ABS64 used with TLS symbol rte_strerror.per_lcore_retval > /usr/bin/ld:build/lib/librte_eal.a(eal_common_errno.o)(.debug_info+0x > 91): R_AARCH64_ABS64 used with TLS symbol per_lcore__rte_errno > INSTALL-APP testpmd > > $ clang -v > Ubuntu clang version 3.6.0-2ubuntu1 (tags/RELEASE_360/final) (based > on > LLVM 3.6.0) > Target: aarch64-unknown-linux-gnu > Thread model: posix > Found candidate GCC installation: > /usr/bin/../lib/gcc/aarch64-linux-gnu/4.9 > Found candidate GCC installation: > /usr/bin/../lib/gcc/aarch64-linux-gnu/4.9.2 > Found candidate GCC installation: > /usr/bin/../lib/gcc/aarch64-linux-gnu/5.0.1 > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/4.9 > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux- > gnu/4.9.2 > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux- > gnu/5.0.1 > Selected GCC installation: /usr/bin/../lib/gcc/aarch64-linux-gnu/4.9 > Candidate multilib: .;@m64 > Selected multilib: .;@m64 > > > > > > > > Ashwin Sekhar T K (6): > > hash: compile armv8a CRC32 support conditionally > > config: add clang support for armv8a linuxapp > > net/thunderx: fix compile errors for armv8a clang > > acl: fix warning seen with armv8a clang > > eal/arm: fix warnings seen with armv8a clang > > eal: fix warning seen with armv8a clang > > > > config/defconfig_arm64-armv8a-linuxapp-clang | 56 > > ++ > > drivers/net/thunderx/base/nicvf_plat.h | 2 +- > > lib/librte_acl/Makefile| 5 +- > > .../common/include/arch/arm/rte_byteorder.h| 2 +- > > lib/librte_eal/linuxapp/eal/Makefile | 4 ++ > > lib/librte_hash/Makefile | 2 + > > lib/librte_hash/rte_crc_arm64.h| 4 -- > > lib/librte_hash/rte_hash_crc.h | 2 +- > > 8 files changed, 69 insertions(+), 8 deletions(-) > > create mode 100644 config/defconfig_arm64-armv8a-linuxapp-clang > >
Re: [dpdk-dev] [PATCH v4 3/4] net: add arm64 neon version of CRC compute APIs
On Fri, 2017-05-12 at 13:51 +0800, Jianbo Liu wrote: > On 9 May 2017 at 17:53, Ashwin Sekhar T K > wrote: > > > > Added CRC compute APIs for arm64 utilizing the pmull > > capability > > > > Added new file net_crc_neon.h to hold the arm64 pmull > > CRC implementation > > > > Verified the changes with crc_autotest unit test case > > > > Signed-off-by: Ashwin Sekhar T K > > --- > > v2: > > * Fixed merge conflict in MAINTAINERS > > > > v3: > > * Moved feature detection changes and GCC_VERSION definition > > changes to separate commit > > * Replaced usage of assert() with RTE_ASSERT() > > * Made the comments in rte_vect.h more positive in sense > > > > v4: > > * Rebased on top of latest commit > > > > MAINTAINERS | 1 + > > lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++ > > lib/librte_net/net_crc_neon.h | 357 > > ++ > > lib/librte_net/rte_net_crc.c | 34 ++- > > lib/librte_net/rte_net_crc.h | 2 + > > 5 files changed, 416 insertions(+), 6 deletions(-) > > create mode 100644 lib/librte_net/net_crc_neon.h > > > > ... > > + > > +struct crc_pmull_ctx crc32_eth_pmull __rte_aligned(16); > > +struct crc_pmull_ctx crc16_ccitt_pmull __rte_aligned(16); > > + > > +static inline uint8x16_t > > +extract_vector(uint8x16_t v0, uint8x16_t v1, const int n) > > +{ > > + switch (n) { > > + case 0: return vextq_u8(v0, v1, 0); > > + case 1: return vextq_u8(v0, v1, 1); > > + case 2: return vextq_u8(v0, v1, 2); > > + case 3: return vextq_u8(v0, v1, 3); > > + case 4: return vextq_u8(v0, v1, 4); > > + case 5: return vextq_u8(v0, v1, 5); > > + case 6: return vextq_u8(v0, v1, 6); > > + case 7: return vextq_u8(v0, v1, 7); > > + case 8: return vextq_u8(v0, v1, 8); > > + case 9: return vextq_u8(v0, v1, 9); > > + case 10: return vextq_u8(v0, v1, 10); > > + case 11: return vextq_u8(v0, v1, 11); > > + case 12: return vextq_u8(v0, v1, 12); > > + case 13: return vextq_u8(v0, v1, 13); > > + case 14: return vextq_u8(v0, v1, 14); > > + case 15: return vextq_u8(v0, v1, 15); > > + } > > + return v1; > > +} > > + > > +/** > > + * Shifts right 128 bit register by specified number of bytes > > + * > > + * @param reg 128 bit value > > + * @param num number of bytes to shift reg by (0-16) > > + * > > + * @return reg << (num * 8) > > + */ > > +static inline uint64x2_t > > +shift_bytes_right(uint64x2_t reg, const unsigned int num) > > +{ > > + /* Right Shift */ > > + return vreinterpretq_u64_u8(extract_vector( > > + vreinterpretq_u8_u64(reg), > > + vdupq_n_u8(0), > > + num)); > > +} > > + > > +/** > > + * Shifts left 128 bit register by specified number of bytes > > + * > > + * @param reg 128 bit value > > + * @param num number of bytes to shift reg by (0-16) > > + * > > + * @return reg << (num * 8) > > + */ > > +static inline uint64x2_t > > +shift_bytes_left(uint64x2_t reg, const unsigned int num) > > +{ > > + /* Left Shift */ > > + return vreinterpretq_u64_u8(extract_vector( > > + vdupq_n_u8(0), > > + vreinterpretq_u8_u64(reg), > > + 16 - num)); > > +} > > + > Can you move shift_bytes_right/shift_bytes_left to rte_vect.h because > they are common functions? These are not really common functions. I dont think it will have a wider usage as its shifting by bytes and not by bits. In x86 case also, xmm_shift_left is not made a common function. Moreover, I have not tested the behaviour of these functions when the shift amt is (< 0) or (> 16) as these cases will never arise in the CRC code. Thanks Ashwin
Re: [dpdk-dev] [PATCH v4 3/4] net: add arm64 neon version of CRC compute APIs
On Fri, 2017-05-12 at 16:49 +0800, Jianbo Liu wrote: > On 12 May 2017 at 15:25, Sekhar, Ashwin > wrote: > > > > On Fri, 2017-05-12 at 13:51 +0800, Jianbo Liu wrote: > > > > > > On 9 May 2017 at 17:53, Ashwin Sekhar T K > > > wrote: > > > > > > > > > > > > Added CRC compute APIs for arm64 utilizing the pmull > > > > capability > > > > > > > > Added new file net_crc_neon.h to hold the arm64 pmull > > > > CRC implementation > > > > > > > > Verified the changes with crc_autotest unit test case > > > > > > > > Signed-off-by: Ashwin Sekhar T K > > > com> > > > > --- > > > > v2: > > > > * Fixed merge conflict in MAINTAINERS > > > > > > > > v3: > > > > * Moved feature detection changes and GCC_VERSION definition > > > > changes to separate commit > > > > * Replaced usage of assert() with RTE_ASSERT() > > > > * Made the comments in rte_vect.h more positive in sense > > > > > > > > v4: > > > > * Rebased on top of latest commit > > > > > > > > MAINTAINERS | 1 + > > > > lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++ > > > > lib/librte_net/net_crc_neon.h | 357 > > > > ++ > > > > lib/librte_net/rte_net_crc.c | 34 ++- > > > > lib/librte_net/rte_net_crc.h | 2 + > > > > 5 files changed, 416 insertions(+), 6 deletions(-) > > > > create mode 100644 lib/librte_net/net_crc_neon.h > > > > > > > > > > ... > > > > > > > > > > > + > > > > +struct crc_pmull_ctx crc32_eth_pmull __rte_aligned(16); > > > > +struct crc_pmull_ctx crc16_ccitt_pmull __rte_aligned(16); > > > > + > > > > +static inline uint8x16_t > > > > +extract_vector(uint8x16_t v0, uint8x16_t v1, const int n) > > > > +{ > > > > + switch (n) { > > > > + case 0: return vextq_u8(v0, v1, 0); > > > > + case 1: return vextq_u8(v0, v1, 1); > > > > + case 2: return vextq_u8(v0, v1, 2); > > > > + case 3: return vextq_u8(v0, v1, 3); > > > > + case 4: return vextq_u8(v0, v1, 4); > > > > + case 5: return vextq_u8(v0, v1, 5); > > > > + case 6: return vextq_u8(v0, v1, 6); > > > > + case 7: return vextq_u8(v0, v1, 7); > > > > + case 8: return vextq_u8(v0, v1, 8); > > > > + case 9: return vextq_u8(v0, v1, 9); > > > > + case 10: return vextq_u8(v0, v1, 10); > > > > + case 11: return vextq_u8(v0, v1, 11); > > > > + case 12: return vextq_u8(v0, v1, 12); > > > > + case 13: return vextq_u8(v0, v1, 13); > > > > + case 14: return vextq_u8(v0, v1, 14); > > > > + case 15: return vextq_u8(v0, v1, 15); > > > > + } > > > > + return v1; > > > > +} > > > > + > > > > +/** > > > > + * Shifts right 128 bit register by specified number of bytes > > > > + * > > > > + * @param reg 128 bit value > > > > + * @param num number of bytes to shift reg by (0-16) > > > > + * > > > > + * @return reg << (num * 8) > > > > + */ > > > > +static inline uint64x2_t > > > > +shift_bytes_right(uint64x2_t reg, const unsigned int num) > > > > +{ > > > > + /* Right Shift */ > > > > + return vreinterpretq_u64_u8(extract_vector( > > > > + vreinterpretq_u8_u64(reg), > > > > + vdupq_n_u8(0), > > > > + num)); > > > > +} > > > > + > > > > +/** > > > > + * Shifts left 128 bit register by specified number of bytes > > > > + * > > > > + * @param reg 128 bit value > > > > + * @param num number of bytes to shift reg by (0-16) > > > > + * > > > > + * @return reg << (num * 8) > > > > + */ > > > > +static inline uint64x2_t > > > > +shift_bytes_left(uint64x2_t reg, const unsigned int num) > > > > +{ > > > > + /* Left Shift */ > > > > + return vreinterpretq_u64_u8(extract_vector( > > > > + vdupq_n_u8(0), > > > > + vreinterpretq_u8_u64(reg), > > > > + 16 - num)); > > > > +} > > > > + > > > Can you move shift_bytes_right/shift_bytes_left to rte_vect.h > > > because > > > they are common functions? > > These are not really common functions. I dont think it will have a > > wider usage as its shifting by bytes and not by bits. > > > I think these shifting may be used by other functions. > For example, to replace _mm_srli_si128. > > > > > In x86 case also, xmm_shift_left is not made a common function. > > > But its counterpart right shifting (_mm_srli_si128) is... > > > > > Moreover, I have not tested the behaviour of these functions when > > the > > shift amt is (< 0) or (> 16) as these cases will never arise in the > > CRC > > code. > > > You can define thee functions according to current requirement. > And I don't think this parameter can be <0 or > 16. Okay. In that case, I will move it to rte_vect.h. Ashwin
Re: [dpdk-dev] [PATCH v4 6/8] examples/l3fwd: add neon support for l3fwd
On Mon, 2017-05-15 at 11:34 +0800, Jianbo Liu wrote: > Use ARM NEON intrinsics to accelerate l3 fowarding. > > Signed-off-by: Jianbo Liu Acked-by: Ashwin Sekhar T K > --- > examples/l3fwd/l3fwd_em.c| 4 +- > examples/l3fwd/l3fwd_em_hlm.h| 17 ++- > examples/l3fwd/l3fwd_em_hlm_neon.h | 74 ++ > examples/l3fwd/l3fwd_em_sequential.h | 18 ++- > examples/l3fwd/l3fwd_lpm.c | 4 +- > examples/l3fwd/l3fwd_lpm_neon.h | 193 > ++ > examples/l3fwd/l3fwd_neon.h | 259 > +++ > 7 files changed, 563 insertions(+), 6 deletions(-) > create mode 100644 examples/l3fwd/l3fwd_em_hlm_neon.h > create mode 100644 examples/l3fwd/l3fwd_lpm_neon.h > create mode 100644 examples/l3fwd/l3fwd_neon.h > > diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c > index ba844b2..da96cfd 100644 > --- a/examples/l3fwd/l3fwd_em.c > +++ b/examples/l3fwd/l3fwd_em.c > @@ -328,7 +328,7 @@ struct ipv6_l3fwd_em_route { > return (uint8_t)((ret < 0) ? portid : > ipv6_l3fwd_out_if[ret]); > } > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > #if defined(NO_HASH_MULTI_LOOKUP) > #include "l3fwd_em_sequential.h" > #else > @@ -709,7 +709,7 @@ struct ipv6_l3fwd_em_route { > if (nb_rx == 0) > continue; > > -#if defined(__SSE4_1__) > +#if defined(__SSE4_1__) || defined(RTE_MACHINE_CPUFLAG_NEON) > l3fwd_em_send_packets(nb_rx, pkts_burst, > portid, > qconf); > #else > diff --git a/examples/l3fwd/l3fwd_em_hlm.h > b/examples/l3fwd/l3fwd_em_hlm.h > index 636dea4..b9163e3 100644 > --- a/examples/l3fwd/l3fwd_em_hlm.h > +++ b/examples/l3fwd/l3fwd_em_hlm.h > @@ -35,8 +35,13 @@ > #ifndef __L3FWD_EM_HLM_H__ > #define __L3FWD_EM_HLM_H__ > > +#if defined(__SSE4_1__) > #include "l3fwd_sse.h" > #include "l3fwd_em_hlm_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) > +#include "l3fwd_neon.h" > +#include "l3fwd_em_hlm_neon.h" > +#endif > > static inline __attribute__((always_inline)) void > em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf > *m[8], > @@ -238,7 +243,7 @@ static inline __attribute__((always_inline)) > uint16_t > l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, > uint8_t portid, struct lcore_conf *qconf) > { > - int32_t j; > + int32_t i, j, pos; > uint16_t dst_port[MAX_PKT_BURST]; > > /* > @@ -247,6 +252,11 @@ static inline __attribute__((always_inline)) > uint16_t > */ > int32_t n = RTE_ALIGN_FLOOR(nb_rx, 8); > > + for (j = 0; j < 8 && j < nb_rx; j++) { > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], > + struct ether_hdr *) + > 1); > + } > + > for (j = 0; j < n; j += 8) { > > uint32_t pkt_type = > @@ -263,6 +273,11 @@ static inline __attribute__((always_inline)) > uint16_t > uint32_t tcp_or_udp = pkt_type & > (RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP); > > + for (i = 0, pos = j + 8; i < 8 && pos < nb_rx; i++, > pos++) { > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[po > s], > + struct > ether_hdr *) + 1); > + } > + > if (tcp_or_udp && (l3_type == RTE_PTYPE_L3_IPV4)) { > > em_get_dst_port_ipv4x8(qconf, > &pkts_burst[j], portid, > diff --git a/examples/l3fwd/l3fwd_em_hlm_neon.h > b/examples/l3fwd/l3fwd_em_hlm_neon.h > new file mode 100644 > index 000..dae1acf > --- /dev/null > +++ b/examples/l3fwd/l3fwd_em_hlm_neon.h > @@ -0,0 +1,74 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 Intel Corporation. All rights reserved. > + * Copyright(c) 2017, Linaro Limited > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or > without > + * modification, are permitted provided that the following > conditions > + * are met: > + * > + * * Redistributions of source code must retain the above > copyright > + * notice, this list of conditions and the following > disclaimer. > + * * Redistributions in binary form must reproduce the above > copyright > + * notice, this list of conditions and the following > disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products > derived > + * from this software without specific prior written > permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > + * LIMITED TO, THE IMPLIED WARRANTIES
Re: [dpdk-dev] [PATCH] examples/performance-thread: add arm64 support
On Thu, 2017-05-18 at 14:35 +0800, Jianbo Liu wrote: > On 18 May 2017 at 02:44, Jerin Jacob > wrote: > > > > -Original Message- > > > > > > Date: Wed, 17 May 2017 11:19:49 -0700 > > > From: Ashwin Sekhar T K > > > To: jerin.ja...@caviumnetworks.com, john.mcnam...@intel.com, > > > jianbo@linaro.org > > > Cc: dev@dpdk.org, Ashwin Sekhar T K > > .com> > > > Subject: [dpdk-dev] [PATCH] examples/performance-thread: add > > > arm64 support > > > X-Mailer: git-send-email 2.12.2 > > > > > > Updated Makefile to allow compilation for arm64 architecture. > > > > > > Moved the code for setting the initial stack to architecture > > > specific > > > directory. > > Please split the patch to two > > - "arch_set_stack" abstraction and associated x86 change > > - arm64 support > There are so many redundant code in l3fwd and l3fwd-thread, I think > it's possible to merge them. > Yes. But I think its better to do them as a completely separate patch set. > > > > > > Thanks Ashwin. > > > > I think, This may be the last feature to make arm64 at par with x86 > > features > > supported in DPDK. > > > > /Jerin
Re: [dpdk-dev] [PATCH v2 2/2] examples/performance-thread: add arm64 support
On Thu, 2017-05-18 at 14:25 +0530, Jerin Jacob wrote: > -Original Message- > > > > Date: Thu, 18 May 2017 00:34:26 -0700 > > From: Ashwin Sekhar T K > > To: jerin.ja...@caviumnetworks.com, john.mcnam...@intel.com, > > jianbo@linaro.org > > Cc: dev@dpdk.org, Ashwin Sekhar T K > om> > > Subject: [dpdk-dev] [PATCH v2 2/2] examples/performance-thread: add > > arm64 > > support > > X-Mailer: git-send-email 2.12.2 > > > > Updated Makefile to allow compilation for arm64 architecture. > > > > Added necessary arm64 support for lthread. > > > > Fixed minor compilation errors for arm64 compilation. > > > > Tested the apps l3fwd-thread and lthread_pthread_shim on thunderx > > and x86_64. > > > > +void > > +ctx_switch(struct ctx *new_ctx __rte_unused, struct ctx *curr_ctx > > __rte_unused) > > +{ > > + /* SAVE CURRENT CONTEXT */ > > + asm volatile ( > > + /* Save SP */ > > + "mov x3, sp\n" > > + "str x3, [x1, #0]\n" > > + > > + /* Save FP and LR */ > > + "stp x29, x30, [x1, #8]\n" > > + > > + /* Save Callee Saved Regs x19 - x28 */ > > + "stp x19, x20, [x1, #24]\n" > > + "stp x21, x22, [x1, #40]\n" > > + "stp x23, x24, [x1, #56]\n" > > + "stp x25, x26, [x1, #72]\n" > > + "stp x27, x28, [x1, #88]\n" > > + ); > IMO, We need to save SIMD registers in the context as well. > x86 code also not doing that, looks like it is an obvious bug in x86 > code as > well. > Yes. You are correct. Need to save the bottom 64-bits of called saved ASIMD regs v8-v15. Will update the patch.