Tamar Christina <tamar.christ...@arm.com> writes: > Hi All, > > There's a slight mismatch between the vectorizer optabs and the intrinsics > patterns for NEON. The vectorizer expects operands[3] and operands[0] to be > the same but the aarch64 intrinsics expanders expect operands[0] and > operands[1] to be the same. > > This means we need different patterns here. This adds a separate usdot > vectorizer pattern which just shuffles around the RTL params. > > There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL > patterns which is not corrected here. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master?
Couldn't we just change: > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) > { > - return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b); > + return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b); …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.? I think that's an OK thing to do when the function is named after an optab rather than an arm_neon.h intrinsic. Thanks, Richard > } > > __extension__ extern __inline int32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) > { > - return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b); > + return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b); > } > > __extension__ extern __inline int32x2_t