Hi James,

Thanks for your comment.

Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax 
way.

Example:
        dup     v1.2s, v1.s[0]
        fmul    v0.2s, v1.2s, v0.2s

And we need another pattern to combine this two insns into 'fmul 
%0.2s,%1.2s,%2.s[0]', which is kind of complex.

BTW: maybe it's better to reconsider this issue after this patch, right?


Thanks.
Jiang jiji



On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> Hi,
>   This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
>   Regtested with aarch64-linux-gnu on QEMU.
>   This patch has no regressions for aarch64_be-linux-gnu big-endian target 
> too.
>   OK for the trunk?
> 
> Thanks.
> Jiang jiji
> 
> 
> ----------
> Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
> 
> Hi, Kyrill
>   Thank you for your suggestion.
>   I fixed it and regtested with aarch64-linux-gnu on QEMU.
>   This patch has no regressions for aarch64_be-linux-gnu big-endian target 
> too.
>   OK for the trunk?

Hi Jiang,

I'm sorry that I've taken so long to get to this, I've been out of office for 
several weeks. I have one comment.

> +__extension__ static __inline float32x2_t __attribute__ 
> +((__always_inline__))
> +vmul_n_f32 (float32x2_t __a, float32_t __b) {
> +  return __builtin_aarch64_mul_nv2sf (__a, __b); }
> +

For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector 
extension syntax to allow us to write these as:

  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
  vmul_n_f32 (float32x2_t __a, float32_t __b)
  {
    return __a * __b;
  }

It would be great if we could make that work.

Thanks,
James

Reply via email to