> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Sent: Tuesday, June 11, 2019 6:58 AM
> To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dev@dpdk.org
> Cc: tho...@monjalon.net; Gavin Hu (Arm Technology China)
> <gavin...@arm.com>; msant...@redhat.com; acon...@redhat.com;
> sta...@dpdk.org; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>;
> nd <n...@arm.com>; nd <n...@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> > >
> > > >
> > > > Since it used in fastpath, a temp variable would be additional
> > > > cost for no reason.
> > > Then, I would suggest we can go with using 'vdupq_n_s32'.
> >
> > We have to form uint64x2_t with 4 x uint32_t variable, How does
> > 'vdupq_n_s32' help here?
> We would use 'vdupq_n_s32' only for the first initialization, the rest of the 
> code
> remains the same (see the diff below)
> 
> > Can you share code snippet without any temp variable?
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h 
> index
> 01b9766d8..b3196cd12 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
> -               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
> +               input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> +               input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> 
>                 input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
>                 input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 
> 1); @@ -
> 242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> +               input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> 
> My understanding is that the generated code for both your patch and my
> changes above is the same. Above suggested changes will conform to ACLE
> recommendation.

Though instructions are different. Effective cycles are same even though
First dup updates the four positions.
To make forward progress send the v2 based on the updated logic
 just to make ACLE  Spec happy, I don’t see any real reason to do it though 😊

http://patches.dpdk.org/patch/54656/


Reply via email to