Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi, > > As subject, this patch rewrites the [su]paddl[q] Neon intrinsics to use > RTL builtins rather than inline assembly code, allowing for better > scheduling and optimization. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master?
OK, thanks. For the record… > __extension__ extern __inline uint64x1_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddl_u32 (uint32x2_t __a) > { > - uint64x1_t __result; > - __asm__ ("uaddlp %0.1d,%1.2s" > - : "=w"(__result) > - : "w"(__a) > - : /* No clobbers */); > - return __result; > + return (uint64x1_t) __builtin_aarch64_uaddlpv2si_uu (__a); > } …I wasn't sure for this whether it would be better to use (uint64x1_t) {…} instead of a scalar-to-vector conversion, since that seems to be the more common style in the rest of arm_neon.h. But there are already instances of this kind of conversion too, and if anything it should be more efficient than creating a distinct vector object. Richard