On Wed, Dec 19, 2018 at 10:33 AM Alejandro Martinez Vicente <alejandro.martinezvice...@arm.com> wrote: > > Hi all, > > Loops that use the fmin/fmax builtins can be vectorized even without > -ffast-math using SVE's FMINNM/FMAXNM instructions. This is an example: > > double > f (double *x, int n) > { > double res = 100.0; > for (int i = 0; i < n; ++i) > res = __builtin_fmin (res, x[i]); > return res; > } > > Before this patch, the compiler would generate this code (-march=armv8.2-a+sve > -O2 -ftree-vectorize): > > 0000000000000000 <f>: > 0: 7100003f cmp w1, #0x0 > 4: 5400018d b.le 34 <f+0x34> > 8: 51000422 sub w2, w1, #0x1 > c: 91002003 add x3, x0, #0x8 > 10: d2e80b21 mov x1, #0x4059000000000000 > 14: 9e670020 fmov d0, x1 > 18: 8b224c62 add x2, x3, w2, uxtw #3 > 1c: d503201f nop > 20: fc408401 ldr d1, [x0],#8 > 24: 1e617800 fminnm d0, d0, d1 > 28: eb02001f cmp x0, x2 > 2c: 54ffffa1 b.ne 20 <f+0x20> > 30: d65f03c0 ret > 34: d2e80b20 mov x0, #0x4059000000000000 > 38: 9e670000 fmov d0, x0 > 3c: d65f03c0 ret > > After this patch, this is the code that gets generated: > > 0000000000000000 <f>: > 0: 7100003f cmp w1, #0x0 > 4: 5400020d b.le 44 <f+0x44> > 8: d2800002 mov x2, #0x0 > c: 25d8e3e0 ptrue p0.d > 10: 93407c21 sxtw x1, w1 > 14: 90000003 adrp x3, 0 <f> > 18: 25804001 mov p1.b, p0.b > 1c: 91000063 add x3, x3, #0x0 > 20: 85c0e060 ld1rd {z0.d}, p0/z, [x3] > 24: 25e11fe0 whilelo p0.d, xzr, x1 > 28: a5e24001 ld1d {z1.d}, p0/z, [x0, x2, lsl #3] > 2c: 04f0e3e2 incd x2 > 30: 65c58020 fminnm z0.d, p0/m, z0.d, z1.d > 34: 25e11c40 whilelo p0.d, x2, x1 > 38: 54ffff81 b.ne 28 <f+0x28> // b.any > 3c: 65c52400 fminnmv d0, p1, z0.d > 40: d65f03c0 ret > 44: d2e80b20 mov x0, #0x4059000000000000 > 48: 9e670000 fmov d0, x0 > 4c: d65f03c0 ret > > This patch extends the support for reductions to include calls to internal > functions, in addition to assign statements. For this purpose, in most places > where a tree_code would be used, a code_helper is used instead. The > code_helper > allows to hold either a tree_code or combined_fn. > > This patch implements these tasks: > > - Detect a reduction candidate based on a call to an internal function > (currently only fmin or fmax). > - Process the reduction using code_helper. This means that at several places > we have to check whether this is as assign-based reduction or a call-based > reduction. > - Add new internal functions for the fmin/fmax reductions and for conditional > fmin/fmax. In architectures where ieee fmin/fmax reductions are available, > it > is still possible to vectorize the loop using unconditional instructions. > - Update SVE's md to support these new reductions. > - Add new SVE tests to check that the optimal code is being generated. > > I tested this patch in an aarch64 machine bootstrapping the compiler and > running the checks.
Just some quick comments based on the above and the changelog. Using code_helper is reasonable I guess. > Alejandro > > gcc/Changelog: > > 2018-12-18 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > * gimple-match.h (code_helper_for_stmnt): New function to get a code_helper_for_stmt I hope. > code_helper from an statement. > * internal-fn.def: New reduc_fmax_scal and reduc_fmin_scal optabs for > ieee fp max/min reductions Aren't they necessarily fold_left reductions then? Thus, should the optabs be named accordingly fold_left_fmax_optab? > * optabs.def: Likewise. > * tree-vect-loop.c (reduction_fn_for_scalar_code): Changed function > signature to accept code_helper instead of tree_code. Handle the > fmax/fmin builtins. > (needs_fold_left_reduction_p): Likewise. > (check_reduction_path): Likewise. > (vect_is_simple_reduction): Use code_helper instead of tree_code. > Check > for supported call-based reductions. Extend support for both > assignment-based and call-based reductions. > (vect_model_reduction_cost): Extend cost-model support to call-based > reductions (just use MAX expression). > (get_initial_def_for_reduction): Use code_helper instead of tree_code. > Extend support for both assignment-based and call-based reductions. > (vect_create_epilog_for_reduction): Likewise. > (vectorizable_reduction): Likewise. > * tree-vectorizer.h: include gimple-match.h for code_helper. Use > code_helper in check_reduction_path signature. > * config/aarch64/aarch64-sve.md: Added define_expand to capture new > reduc_fmax_scal and reduc_fmin_scal optabs. > * config/aarch64/iterators.md: New FMAXMINNMV and fmaxmin_uns > iterators > to support the new define_expand. > > gcc/testsuite/Changelog: > > 2018-12-18 Alejandro Martinez <alejandro.martinezvice...@arm.com> > > * gcc.target/aarch64/sve/reduc_9.c: New test to check > SVE-vectorized reductions without -ffast-math. > * gcc.target/aarch64/sve/reduc_10.c: New test to check > SVE-vectorized builtin reductions without -ffast-math.