Re: [dpdk-dev] DPDK compilation on arm is failing in Travis

Aaron Conole Thu, 06 Jun 2019 07:52:15 -0700

Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> writes:

>  
>
>  
>
> From: Michael Santana Francisco <msant...@redhat.com> 
> Sent: Wednesday, June 5, 2019 5:39 PM
> To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Aaron Conole
> <acon...@redhat.com>; tho...@monjalon.net
> Cc: Ruifeng Wang (Arm Technology China) <ruifeng.w...@arm.com>; Gavin Hu (Arm 
> Technology
> China) <gavin...@arm.com>; Dharmik Thakkar <dharmik.thak...@arm.com>;
> jerin.ja...@caviumnetworks.com; ys...@mellanox.com; dev@dpdk.org;
> bruce.richard...@intel.com; nd <n...@arm.com>
> Subject: Re: DPDK compilation on arm is failing in Travis
>
>  
>
> On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
>
>   
>
> Thomas Monjalon <tho...@monjalon.net> writes:
>
>  
>
>  05/06/2019 21:40, Aaron Conole:
>
>  Thomas Monjalon <tho...@monjalon.net> writes:
>
>  
>
>  The compilation of the master branch is failing for aarch64:
>
>  https://travis-ci.com/DPDK/dpdk
>
> The log is so much verbose that I am not able to understand what
>
> is really wrong.
>
> Please help to diagnose and fix, thanks.
>
>  
>
> A discussion about this:
>
>  
>
> http://mails.dpdk.org/archives/dev/2019-June/134012.html
>
>  
>
> I see the error now.
>
> It is printing the full log after the error, so I missed the error
>
> at the top.
>
>  
>
> I've read your comment about a possible error with the patch
>
> removing weak functions but neither me nor Bruce were able to reproduce
>
> it.
>
>  What is the condition to see this compiler warning?
>
>  
>
> It is only on ARM, and only when the neon intrinsics are in use.
>
> I am not able to reproduce it from the tip of master.
>
>  
>
> I am using:
>
> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>
>  
>
> From the log on Travis, looks like the compiler is:
>
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>
>  
>
> Is this the issue?
>
>  
>
> Why are we seeing the error now?
>
> I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it works 
> fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
>
> Are there any issues in upgrading to 7 or 8?
>
> I have tested it on my ubuntu 16.04 vm on commit 
> 8cb511bb94ad92a76990f175cac76bb13d51daba
> (head of master seems to be failing for other reasons on my vm).
> I tested the following gcc versions:
>
> gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>
> All tested versions failed on the exact same error shown in travis. I don't 
> know if the compiler is at
> fault here. Maybe Aaron's patch is a viable option?
>
>  The issue is the vector lane setting code looks like:
>
>  
>
>    lval = lane_set(scalar, rval, lane id)
>
>  
>
> In this case, 'rval' is being used before it is ever set, but it
>
> really could be just 0 for the first lane setting code.  Thereafter,
>
> we use the old value of input as the rval, but each time a different lane is 
> set.
>
>  
>
> It would be nice if there were an intrinsic that formatted correctly
>
> from the start (something we could call like lval =
>
> lane_set_from_array(scalar_array)).
>
> [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the 
> following?


Well, it isn't exactly that.  You are setting all lanes from a scalar.
I'd rather be able to say:

   input0 = vdupq_nn_s32(&parms[0]);
   input1 = vdupq_nn_s32(&parms[4]);

Something like that, which lets us delete all the rest of the lane-set
code.  But it seems it doesn't exist.

Regardless, I think either patch should work (either using the 'all
lanes' setting you have or the static variable).  I have no preference
on it - it's up to you (or someone else) to say which is preferred.  I
guess your version could be preferable since there's no static to need
to "explain" :)

> honnag01@qc2400f-1:~/dpdk$ git diff
>
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
>
> index 01b9766d8..b3196cd12 100644
>
> --- a/lib/librte_acl/acl_run_neon.h
>
> +++ b/lib/librte_acl/acl_run_neon.h
>
> @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const 
> uint8_t **data,
>
>  
>
>         while (flows.started > 0) {
>
>                 /* Gather 4 bytes of input data for each stream. */
>
> -               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
>
> -               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
>
> +               input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>
> +               input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
>
>  
>
>                 input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
>
>                 input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
>
> @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const 
> uint8_t **data,
>
>  
>
>         while (flows.started > 0) {
>
>                 /* Gather 4 bytes of input data for each stream. */
>
> -               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>
> +               input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>
>                                                 input = 
> vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>
>  
>
>  Then 'input' would never appear as an rval before it was set.
>
>  
>
> I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
>
> There are three 'fixes' I know exist - one is to squelch the warning
>
> (but I don't like it because it could hide future code that introduces
>
> this), one is to create a static and use assignment, one is to replace
>
> the first call and pass in a 0'd lane for the first one.
>
>  
>
> Actually, I think I have a patch that could work to not introduce an
>
> assignment, but squelch the warning.  Something like the following (not
>
> tested).
>
>   
>
> ---
>
>  
>
> diff --git a/lib/librte_acl/acl_run_neon.h
>
> b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
>
> --- a/lib/librte_acl/acl_run_neon.h
>
> +++ b/lib/librte_acl/acl_run_neon.h
>
> @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>     uint64_t index_array[8];
>
>     struct completion cmplt[8];
>
>     struct parms parms[8];
>
> +   static int32x4_t ZEROVAL;
>
>     int32x4_t input0, input1;
>
>  
>
>     acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>
> 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>  
>
>     while (flows.started > 0) {
>
>             /* Gather 4 bytes of input data for each stream. */
>
> -           input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
>
> 0);
>
> -           input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
>
> 0);
>
> +           input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>
> ZEROVAL, 0);
>
> +           input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>
> ZEROVAL, 0);
>
>  
>
>             input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
>
> 1);
>
>              input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
>
> 1); @@
>
>  -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>     uint64_t index_array[4];
>
>     struct completion cmplt[4];
>
>     struct parms parms[4];
>
> +   static int32x4_t ZEROVAL;
>
>     int32x4_t input;
>
>  
>
>     acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>
> 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>  
>
>     while (flows.started > 0) {
>
>             /* Gather 4 bytes of input data for each stream. */
>
> -           input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>
> +           input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>
> ZEROVAL, 0);
>
>             input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>
>             input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>
>             input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>
> --
>
> 2.21.0

Re: [dpdk-dev] DPDK compilation on arm is failing in Travis

Reply via email to