On 27/05/16 14:42, James Greenhalgh wrote:
On Tue, May 24, 2016 at 09:24:03AM +0100, Jiong Wang wrote:
These intrinsics was implemented by inline assembly using "faddp"
instruction.
There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only
while we can
extend this pattern to support VDQF mode, then we can reimplement these
intrinsics through builtlins.
gcc/
2016-05-23 Jiong Wang <jiong.w...@arm.com>
* config/aarch64/aarch64-builtins.def (faddp): New builtins
for modes in VDQF.
* config/aarch64/aarch64-simd.md (aarch64_faddp<mode>): New.
(arch64_addpv4sf): Delete.
(reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
"gen_aarch64_addpv4sf".
* gcc/config/aarch64/iterators.md (UNSPEC_FADDP): New.
* config/aarch64/arm_neon.h (vpadd_f32): Remove inline
assembly. Use
builtin.
(vpaddq_f32): Likewise.
(vpaddq_f64): Likewise.
This ChangeLog format is incorrect.
You've missed vpaddd_f64 and vpadds_f32, could you add those?
vpaddd_f64 is already there without inline assembly.
This patch cleans up those intrinsics with symmetric vector input and
output.
vpadds_f32 looks to me is doing reduce job the return value is scalar
instead of vector thus
can't fit well by the touched pattern. I can clean it up with a seperate
patch. Is this OK?
Thanks,
James