> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Thursday, July 15, 2021 8:35 PM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; Marcus Shawcroft > <marcus.shawcr...@arm.com>; Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics > optabs > > Tamar Christina <tamar.christ...@arm.com> writes: > > Hi All, > > > > There's a slight mismatch between the vectorizer optabs and the > > intrinsics patterns for NEON. The vectorizer expects operands[3] and > > operands[0] to be the same but the aarch64 intrinsics expanders expect > > operands[0] and operands[1] to be the same. > > > > This means we need different patterns here. This adds a separate > > usdot vectorizer pattern which just shuffles around the RTL params. > > > > There's also an inconsistency between the usdot and (u|s)dot > > intrinsics RTL patterns which is not corrected here. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > Couldn't we just change: > > > diff --git a/gcc/config/aarch64/arm_neon.h > > b/gcc/config/aarch64/arm_neon.h index > > > 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac > e4f > > c7f43e2040a8 100644 > > --- a/gcc/config/aarch64/arm_neon.h > > +++ b/gcc/config/aarch64/arm_neon.h > > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t > > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > > vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) { > > - return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b); > > + return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b); > > …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
Not easily, as I was mentioning before, Neon intrinsics have the assumption that operands[0] and operands[1] are the same. And this goes much further than just the header call. The actual type is determined by the optabs and the C stubs that are generated. aarch64_init_simd_builtins which creates the C function stubs starts processing arguments from the end and on non-void functions assumes that the value at operands[0] be the return type. So simply moving __r will get it to think that the result type should be uint8x8_t. I can bypass this but then have to write a custom expander in expand code to handle this, but at point, is it really worth it.. Tamar > I think that's an OK thing to do when the function is named after > an optab rather than an arm_neon.h intrinsic. > > Thanks, > Richard > > > } > > > > __extension__ extern __inline int32x4_t > > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > > vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) > > { > > - return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b); > > + return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b); > > } > > > > __extension__ extern __inline int32x2_t