Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi,
>
> As subject, this patch rewrites the [su]paddl[q] Neon intrinsics to use
> RTL builtins rather than inline assembly code, allowing for better
> scheduling and optimization.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?

OK, thanks.  For the record…

>  __extension__ extern __inline uint64x1_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vpaddl_u32 (uint32x2_t __a)
>  {
> -  uint64x1_t __result;
> -  __asm__ ("uaddlp %0.1d,%1.2s"
> -           : "=w"(__result)
> -           : "w"(__a)
> -           : /* No clobbers */);
> -  return __result;
> +  return (uint64x1_t) __builtin_aarch64_uaddlpv2si_uu (__a);
>  }

…I wasn't sure for this whether it would be better to use (uint64x1_t) {…}
instead of a scalar-to-vector conversion, since that seems to be the more
common style in the rest of arm_neon.h.  But there are already instances
of this kind of conversion too, and if anything it should be more
efficient than creating a distinct vector object.

Richard

Reply via email to