Re: [PATCH] 2015-09-03 Benedikt Huber Philipp Tomsich

Marcus Shawcroft Mon, 21 Sep 2015 01:23:39 -0700

Hi,

Thanks for your work on this.  There are a bunch of predominantly
style nits in line below.  My none nit comments on this patch are:

This should be left turned off for all cores where we have not seen
benchmark numbers to indicate that this optimization is a benefit, we
can take patches for each core in the future once numbers are
available.

Given the absence of numbers for some of the cores and the significant
impact on one core this should remain disabled by default on generic.

We don't necessarily want a proliferation of user orientated tuning
options without good reason, instead the per core default tuning
behaviour should make the right choice per core, hence I think -mrecip
should be dropped.  We already have the -moverride= mechanism to
provide a developer orientated mechanism to override the per core
tuning flag.

Further comments inline....

Thanks

/Marcus

On 7 September 2015 at 11:40, Benedikt Huber
<benedikt.hu...@theobroma-systems.com> wrote:

--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,4 +29,5 @@
      AARCH64_TUNE_ to give an enum name. */

 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
+AARCH64_EXTRA_TUNING_OPTION ("mrecip_default_enabled", MRECIP_DEFAULT_ENABLED)

This name is exposed via the -moverride=tune=xxxx option, perhaps a
better name would be:
+AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)

> +/* Add builtins for reciprocal square root. */
> +void
> +aarch64_add_builtin_rsqrt (void)
> +{
> +  tree fndecl = NULL;
> +  tree ftype = NULL;
> +
> +  tree V2SF_type_node = build_vector_type (float_type_node, 2);
> +  tree V2DF_type_node = build_vector_type (double_type_node, 2);
> +  tree V4SF_type_node = build_vector_type (float_type_node, 4);
> +
> +  ftype = build_function_type_list (double_type_node, double_type_node, 
> NULL_TREE);

Line length exceeds 80 characters.

> +  fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df",
> +    ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE);
> +  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl;
> +
> +  ftype = build_function_type_list (float_type_node, float_type_node, 
> NULL_TREE);

Line length exceeds 80 characters.

> +/* Function to expand reciprocal square root builtins. */
> +static rtx
> +aarch64_expand_builtin_rsqrt (int fcode, tree exp, rtx target)
> +{
> +  rtx pat;
> +  tree arg0 = CALL_EXPR_ARG (exp, 0);
> +  rtx op0 = expand_normal (arg0);
> +
> +  enum insn_code c;
> +
> +  switch (fcode)
> +    {
> +      case AARCH64_BUILTIN_RSQRT_DF:
> +        c = CODE_FOR_rsqrt_df2; break;

Leading blocks of 8 spaces should be TAB's, likewise the rest of the patch.

> +/* Return builtin for reciprocal square root. */
> +tree
> +aarch64_builtin_rsqrt (unsigned int fn, bool md_fn)

Blank line between function comments and function, likewise through
the rest of the patch please.

> diff --git a/gcc/config/aarch64/aarch64-opts.h 
> b/gcc/config/aarch64/aarch64-opts.h
> index bf6bb7b..f8e79cb 100644
> --- a/gcc/config/aarch64/aarch64-opts.h
> +++ b/gcc/config/aarch64/aarch64-opts.h
> @@ -73,4 +73,11 @@ enum aarch64_code_model {
>    AARCH64_CMODEL_LARGE
>  };
>
> +/* Each core can have -mrecip enabled or disabled by default. */
> +enum aarch64_mrecip {

{ on new line please.

> +  AARCH64_MRECIP_OFF = 0,
> +  AARCH64_MRECIP_ON,
> +  AARCH64_MRECIP_DEFAULT,

Trailing , will give a "comma at end of enumerator list" warning every
time this file is included,  drop the comma please.

> +/* Function to decide when to use
> + * reciprocal square root builtins. */

Drop the leading * on follow on comment lines, here and elsewhere in
the patch please.

> +/* Select reciprocal square root initial estimate
> + * insn depending on machine mode. */
> +rsqrte_type get_rsqrte_type (enum machine_mode mode)

New line between type and function name please, likewise for other
instance in this patch.

> +  for (int i = 0; i < iterations; ++i)
> +    {
> +      rtx x1 = gen_reg_rtx (mode);
> +      rtx x2 = gen_reg_rtx (mode);
> +      rtx x3 = gen_reg_rtx (mode);
> +      emit_set_insn (x2, gen_rtx_MULT (mode, x0, x0));
> +
> +      emit_insn ((*get_rsqrts_type (mode)) (x3, xsrc, x2));
> +
> +      emit_set_insn (x1, gen_rtx_MULT (mode, x0, x3));
> +      x0 = x1;
> +    }
> +
> +  emit_move_insn (dst, x0);
> +  return;

Superflous return, drop it please.

> +#undef TARGET_BUILTIN_RECIPROCAL
> +#define TARGET_BUILTIN_RECIPROCAL aarch64_builtin_reciprocal
> +

The rest of these TARGET_* defines are in alphabetical order, please
insert the new one in order.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> +
> +mlow-precision-recip-sqrt
> +Common Var(flag_mrecip_low_precision_sqrt) Optimization
> +Run fewer approximation steps to reduce latency and precision.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

> +@item -mlow-precision-recip-sqrt
> +@item -mno-low-precision-recip-sqrt
> +@opindex -mlow-precision-recip-sqrt
> +@opindex -mno-low-precision-recip-sqrt
> +The square root estimate uses two steps instead of three for 
> double-precision,
> +and one step instead of two for single-precision. Thus reducing latency and 
> precision.

Two spaces after . in text please.

> +++ b/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check.c

Please follow the notes on the WIKI for new test cases added to this
directory https://gcc.gnu.org/wiki/TestCaseWriting
In this case rsqrt_asm_check_1.c likewise for the other test cases.

> @@ -0,0 +1,107 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 --save-temps -fverbose-asm -ffast-math -mrecip" } */
> +
> +#include <math.h>
> +#include <stdio.h>
> +
> +#include <float.h>

Test cases should not rely on external headers, see
https://gcc.gnu.org/wiki/HowToPrepareATestcase

Stick to GNU style in new test cases unless there is a good reason not
to please. Spaces before (, function names on new lines etc.   There
is a script in gcc/contrib/check_GNU_style.sh that will highlight a
bunch of issues in this patch.

> +//  printf ("Problem in %20s: %30.18A should be %30.18A\n", s, r, result); \

Drop the test code please.

Re: [PATCH] 2015-09-03 Benedikt Huber Philipp Tomsich

Reply via email to