Hi, Thanks for your work on this. There are a bunch of predominantly style nits in line below. My none nit comments on this patch are:
This should be left turned off for all cores where we have not seen benchmark numbers to indicate that this optimization is a benefit, we can take patches for each core in the future once numbers are available. Given the absence of numbers for some of the cores and the significant impact on one core this should remain disabled by default on generic. We don't necessarily want a proliferation of user orientated tuning options without good reason, instead the per core default tuning behaviour should make the right choice per core, hence I think -mrecip should be dropped. We already have the -moverride= mechanism to provide a developer orientated mechanism to override the per core tuning flag. Further comments inline.... Thanks /Marcus On 7 September 2015 at 11:40, Benedikt Huber <benedikt.hu...@theobroma-systems.com> wrote: --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -29,4 +29,5 @@ AARCH64_TUNE_ to give an enum name. */ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) +AARCH64_EXTRA_TUNING_OPTION ("mrecip_default_enabled", MRECIP_DEFAULT_ENABLED) This name is exposed via the -moverride=tune=xxxx option, perhaps a better name would be: +AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) > +/* Add builtins for reciprocal square root. */ > +void > +aarch64_add_builtin_rsqrt (void) > +{ > + tree fndecl = NULL; > + tree ftype = NULL; > + > + tree V2SF_type_node = build_vector_type (float_type_node, 2); > + tree V2DF_type_node = build_vector_type (double_type_node, 2); > + tree V4SF_type_node = build_vector_type (float_type_node, 4); > + > + ftype = build_function_type_list (double_type_node, double_type_node, > NULL_TREE); Line length exceeds 80 characters. > + fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df", > + ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE); > + aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl; > + > + ftype = build_function_type_list (float_type_node, float_type_node, > NULL_TREE); Line length exceeds 80 characters. > +/* Function to expand reciprocal square root builtins. */ > +static rtx > +aarch64_expand_builtin_rsqrt (int fcode, tree exp, rtx target) > +{ > + rtx pat; > + tree arg0 = CALL_EXPR_ARG (exp, 0); > + rtx op0 = expand_normal (arg0); > + > + enum insn_code c; > + > + switch (fcode) > + { > + case AARCH64_BUILTIN_RSQRT_DF: > + c = CODE_FOR_rsqrt_df2; break; Leading blocks of 8 spaces should be TAB's, likewise the rest of the patch. > +/* Return builtin for reciprocal square root. */ > +tree > +aarch64_builtin_rsqrt (unsigned int fn, bool md_fn) Blank line between function comments and function, likewise through the rest of the patch please. > diff --git a/gcc/config/aarch64/aarch64-opts.h > b/gcc/config/aarch64/aarch64-opts.h > index bf6bb7b..f8e79cb 100644 > --- a/gcc/config/aarch64/aarch64-opts.h > +++ b/gcc/config/aarch64/aarch64-opts.h > @@ -73,4 +73,11 @@ enum aarch64_code_model { > AARCH64_CMODEL_LARGE > }; > > +/* Each core can have -mrecip enabled or disabled by default. */ > +enum aarch64_mrecip { { on new line please. > + AARCH64_MRECIP_OFF = 0, > + AARCH64_MRECIP_ON, > + AARCH64_MRECIP_DEFAULT, Trailing , will give a "comma at end of enumerator list" warning every time this file is included, drop the comma please. > +/* Function to decide when to use > + * reciprocal square root builtins. */ Drop the leading * on follow on comment lines, here and elsewhere in the patch please. > +/* Select reciprocal square root initial estimate > + * insn depending on machine mode. */ > +rsqrte_type get_rsqrte_type (enum machine_mode mode) New line between type and function name please, likewise for other instance in this patch. > + for (int i = 0; i < iterations; ++i) > + { > + rtx x1 = gen_reg_rtx (mode); > + rtx x2 = gen_reg_rtx (mode); > + rtx x3 = gen_reg_rtx (mode); > + emit_set_insn (x2, gen_rtx_MULT (mode, x0, x0)); > + > + emit_insn ((*get_rsqrts_type (mode)) (x3, xsrc, x2)); > + > + emit_set_insn (x1, gen_rtx_MULT (mode, x0, x3)); > + x0 = x1; > + } > + > + emit_move_insn (dst, x0); > + return; Superflous return, drop it please. > +#undef TARGET_BUILTIN_RECIPROCAL > +#define TARGET_BUILTIN_RECIPROCAL aarch64_builtin_reciprocal > + The rest of these TARGET_* defines are in alphabetical order, please insert the new one in order. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > + > +mlow-precision-recip-sqrt > +Common Var(flag_mrecip_low_precision_sqrt) Optimization > +Run fewer approximation steps to reduce latency and precision. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > +@item -mlow-precision-recip-sqrt > +@item -mno-low-precision-recip-sqrt > +@opindex -mlow-precision-recip-sqrt > +@opindex -mno-low-precision-recip-sqrt > +The square root estimate uses two steps instead of three for > double-precision, > +and one step instead of two for single-precision. Thus reducing latency and > precision. Two spaces after . in text please. > +++ b/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c Please follow the notes on the WIKI for new test cases added to this directory https://gcc.gnu.org/wiki/TestCaseWriting In this case rsqrt_asm_check_1.c likewise for the other test cases. > @@ -0,0 +1,107 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3 --save-temps -fverbose-asm -ffast-math -mrecip" } */ > + > +#include <math.h> > +#include <stdio.h> > + > +#include <float.h> Test cases should not rely on external headers, see https://gcc.gnu.org/wiki/HowToPrepareATestcase Stick to GNU style in new test cases unless there is a good reason not to please. Spaces before (, function names on new lines etc. There is a script in gcc/contrib/check_GNU_style.sh that will highlight a bunch of issues in this patch. > +// printf ("Problem in %20s: %30.18A should be %30.18A\n", s, r, result); \ Drop the test code please.