Jerin Jacob Kollanukkaran <jer...@marvell.com> writes: > On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote: >> Jerin Jacob Kollanukkaran <jer...@marvell.com> writes: >> >> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote: >> > > --------------------------------------------------------------- >> > > ---- >> > > --- >> > > Compiler complains of argument type mismatch, like: >> > >> > Can you share more details on how to reproduce this issue? >> >> It will be generated using the meson build after enabling the neon >> extension support (which isn't currently happening on ARM using meson >> as >> the build environment). > > > Can you share the patch to enable this for testing.
Sure - I'm using these: (needed) 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html (following only needed for travis support) 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html -Aaron > Since the additional memcpy in fastpath, I need to check the overhead > and check the possibility to avoid the memcpy to case. > > >> >> > We already have >> > CFLAGS_acl_run_neon.o += -flax-vector-conversions >> > in the Makefile. >> > >> > If you are taking out -flax-vector-conversions the correct way to >> > fix will be use vreinterpret*. >> > >> > For me the code looks clean, If unnecessary casting is avoided. >> >> I agree. I merely make explicit the casts that the compiler will be >> implicitly introducing. >> >> > > ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’: >> > > ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax- >> > > vector- >> > > conversions >> > > to permit conversions between vectors with differing >> > > element >> > > types >> > > or numbers of subparts >> > > node_type = vbicq_s32(tr_hi_lo.val[0], index_msk); >> > > ^ >> > > ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible >> > > type >> > > for >> > > argument 2 of ‘vbicq_s32’ >> > > >> > > Signed-off-by: Aaron Conole <acon...@redhat.com> >> > > --- >> > > lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++--------- >> > > ---- >> > > -- >> > > 1 file changed, 27 insertions(+), 19 deletions(-) >> > > >> > > >> > > >> > > /* >> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, >> > > const uint8_t **data, >> > > acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]); >> > > acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]); >> > > >> > > + memset(&input0, 0, sizeof(input0)); >> > > + memset(&input1, 0, sizeof(input1)); >> > >> > Why this memset only required for arm64? If it real issue, >> > Shouldn't >> > it required for x86 and ppc ? >> >> No. Please see the following lines (which is due to the ARM neon >> intrinsic for setting individual lanes): >> >> while (flows.started > 0) { >> /* Gather 4 bytes of input data for each stream. */ >> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), >> input0, 0); >> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), >> input1, 0); >> >> Note: the first time through this loop, input0 and input1 appear on >> the >> rhs of the assignment before appearing on the lhs. This will >> generate >> an uninitialized value warning, even though the assignments are to >> individual lanes of the vector. >> >> I squelched the warning from the compiler in the most brute-force way >> possible. Perhaps it would be better to use a static initialization >> for >> the vector but this code was intended to be RFC and to generate >> feedback. >> >> I guess one alternate approach could be: >> >> static const int32x4_t ZERO_VEC; >> int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC; >> >> ... >> >> int32x4_t input = ZERO_VEC; >> >> This would have the benefit of keeping the initializer as 'fast' as >> possible (although I recall a memset under a certain size threshold >> is >> the same effect, but not certain). >> >> Either way, I prefer it to squelching the warning, since the warning >> has been found to catch legitimate errors many times. > > I will get back to this after reproducing the issue locally. Awesome - thanks.