Hi James, Thanks for your comment.
Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax way. Example: dup v1.2s, v1.s[0] fmul v0.2s, v1.2s, v0.2s And we need another pattern to combine this two insns into 'fmul %0.2s,%1.2s,%2.s[0]', which is kind of complex. BTW: maybe it's better to reconsider this issue after this patch, right? Thanks. Jiang jiji On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote: > Hi, > This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html > Regtested with aarch64-linux-gnu on QEMU. > This patch has no regressions for aarch64_be-linux-gnu big-endian target > too. > OK for the trunk? > > Thanks. > Jiang jiji > > > ---------- > Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics > > Hi, Kyrill > Thank you for your suggestion. > I fixed it and regtested with aarch64-linux-gnu on QEMU. > This patch has no regressions for aarch64_be-linux-gnu big-endian target > too. > OK for the trunk? Hi Jiang, I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment. > +__extension__ static __inline float32x2_t __attribute__ > +((__always_inline__)) > +vmul_n_f32 (float32x2_t __a, float32_t __b) { > + return __builtin_aarch64_mul_nv2sf (__a, __b); } > + For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as: __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmul_n_f32 (float32x2_t __a, float32_t __b) { return __a * __b; } It would be great if we could make that work. Thanks, James