Hi all, Loops that use the fmin/fmax builtins can be vectorized even without -ffast-math using SVE's FMINNM/FMAXNM instructions. This is an example: double f (double *x, int n) { double res = 100.0; for (int i = 0; i < n; ++i) res = __builtin_fmin (res, x[i]); return res; }
Before this patch, the compiler would generate this code (-march=armv8.2-a+sve -O2 -ftree-vectorize): 0000000000000000 <f>: 0: 7100003f cmp w1, #0x0 4: 5400018d b.le 34 <f+0x34> 8: 51000422 sub w2, w1, #0x1 c: 91002003 add x3, x0, #0x8 10: d2e80b21 mov x1, #0x4059000000000000 14: 9e670020 fmov d0, x1 18: 8b224c62 add x2, x3, w2, uxtw #3 1c: d503201f nop 20: fc408401 ldr d1, [x0],#8 24: 1e617800 fminnm d0, d0, d1 28: eb02001f cmp x0, x2 2c: 54ffffa1 b.ne 20 <f+0x20> 30: d65f03c0 ret 34: d2e80b20 mov x0, #0x4059000000000000 38: 9e670000 fmov d0, x0 3c: d65f03c0 ret After this patch, this is the code that gets generated: 0000000000000000 <f>: 0: 7100003f cmp w1, #0x0 4: 5400020d b.le 44 <f+0x44> 8: d2800002 mov x2, #0x0 c: 25d8e3e0 ptrue p0.d 10: 93407c21 sxtw x1, w1 14: 90000003 adrp x3, 0 <f> 18: 25804001 mov p1.b, p0.b 1c: 91000063 add x3, x3, #0x0 20: 85c0e060 ld1rd {z0.d}, p0/z, [x3] 24: 25e11fe0 whilelo p0.d, xzr, x1 28: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3] 2c: 04f0e3e2 incd x2 30: 65c58020 fminnm z0.d, p0/m, z0.d, z1.d 34: 25e11c40 whilelo p0.d, x2, x1 38: 54ffff81 b.ne 28 <f+0x28> // b.any 3c: 65c52400 fminnmv d0, p1, z0.d 40: d65f03c0 ret 44: d2e80b20 mov x0, #0x4059000000000000 48: 9e670000 fmov d0, x0 4c: d65f03c0 ret This patch extends the support for reductions to include calls to internal functions, in addition to assign statements. For this purpose, in most places where a tree_code would be used, a code_helper is used instead. The code_helper allows to hold either a tree_code or combined_fn. This patch implements these tasks: - Detect a reduction candidate based on a call to an internal function (currently only fmin or fmax). - Process the reduction using code_helper. This means that at several places we have to check whether this is as assign-based reduction or a call-based reduction. - Add new internal functions for the fmin/fmax reductions and for conditional fmin/fmax. In architectures where ieee fmin/fmax reductions are available, it is still possible to vectorize the loop using unconditional instructions. - Update SVE's md to support these new reductions. - Add new SVE tests to check that the optimal code is being generated. I tested this patch in an aarch64 machine bootstrapping the compiler and running the checks. Alejandro gcc/Changelog: 2018-12-18 Alejandro Martinez <alejandro.martinezvice...@arm.com> * gimple-match.h (code_helper_for_stmnt): New function to get a code_helper from an statement. * internal-fn.def: New reduc_fmax_scal and reduc_fmin_scal optabs for ieee fp max/min reductions * optabs.def: Likewise. * tree-vect-loop.c (reduction_fn_for_scalar_code): Changed function signature to accept code_helper instead of tree_code. Handle the fmax/fmin builtins. (needs_fold_left_reduction_p): Likewise. (check_reduction_path): Likewise. (vect_is_simple_reduction): Use code_helper instead of tree_code. Check for supported call-based reductions. Extend support for both assignment-based and call-based reductions. (vect_model_reduction_cost): Extend cost-model support to call-based reductions (just use MAX expression). (get_initial_def_for_reduction): Use code_helper instead of tree_code. Extend support for both assignment-based and call-based reductions. (vect_create_epilog_for_reduction): Likewise. (vectorizable_reduction): Likewise. * tree-vectorizer.h: include gimple-match.h for code_helper. Use code_helper in check_reduction_path signature. * config/aarch64/aarch64-sve.md: Added define_expand to capture new reduc_fmax_scal and reduc_fmin_scal optabs. * config/aarch64/iterators.md: New FMAXMINNMV and fmaxmin_uns iterators to support the new define_expand. gcc/testsuite/Changelog: 2018-12-18 Alejandro Martinez <alejandro.martinezvice...@arm.com> * gcc.target/aarch64/sve/reduc_9.c: New test to check SVE-vectorized reductions without -ffast-math. * gcc.target/aarch64/sve/reduc_10.c: New test to check SVE-vectorized builtin reductions without -ffast-math.
final.patch
Description: final.patch