On Wed, Apr 27, 2016 at 04:13:33PM -0500, Evandro Menezes wrote:
>    gcc/
>         * config/aarch64/aarch64-protos.h
>         (AARCH64_APPROX_MODE): New macro.
>         (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}):
>    Likewise.
>         (tune_params): New member "approx_rsqrt_modes".
>         * config/aarch64/aarch64-tuning-flags.def
>         (AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
>         * config/aarch64/aarch64.c
>         (generic_tunings): New member "approx_rsqrt_modes".
>         (cortexa35_tunings): Likewise.
>         (cortexa53_tunings): Likewise.
>         (cortexa57_tunings): Likewise.
>         (cortexa72_tunings): Likewise.
>         (exynosm1_tunings): Likewise.
>         (thunderx_tunings): Likewise.
>         (xgene1_tunings): Likewise.
>         (use_rsqrt_p): New argument for the mode and use new member from
>         "tune_params".
>         (aarch64_builtin_reciprocal): Devise mode from builtin.
>         (aarch64_optab_supported_p): New argument for the mode.
>         * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index f22a31c..50f1d24 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -178,6 +178,32 @@ struct cpu_branch_cost
>    const int unpredictable;  /* Unpredictable branch or optimizing for speed. 
>  */
>  };
>  
> +/* Control approximate alternatives to certain FP operators.  */
> +#define AARCH64_APPROX_MODE(MODE) \
> +  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
> +   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
> +   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
> +     ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
> +           + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
> +     : (0))
> +#define AARCH64_APPROX_NONE (0)
> +#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
> +                        | AARCH64_APPROX_MODE (V2SFmode) \
> +                        | AARCH64_APPROX_MODE (V4SFmode))
> +#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \
> +                        | AARCH64_APPROX_MODE (V2DFmode))
> +#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \
> +                           | AARCH64_APPROX_MODE (DFmode) \
> +                           | AARCH64_APPROX_MODE (V2SFmode))
> +#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \
> +                           | AARCH64_APPROX_MODE (V2DFmode))
> +#define AARCH64_APPROX_SCALAR (AARCH64_APPROX_MODE (SFmode) \
> +                            | AARCH64_APPROX_MODE (DFmode))
> +#define AARCH64_APPROX_VECTOR (AARCH64_APPROX_MODE (V2SFmode) \
> +                            | AARCH64_APPROX_MODE (V4SFmode) \
> +                            | AARCH64_APPROX_MODE (V2DFmode))
> +#define AARCH64_APPROX_ALL (-1)
> +

Thanks for providing these various subsets, but I think they are
unneccesary for the final submission. From what I can see, only 
AARCH64_APPROX_ALL and AARCH64_APPROX_NONE are used. Please remove the
rest, they are easy enough to add back if a subtarget wants them.

>  struct tune_params
>  {
>    const struct cpu_cost_table *insn_extra_cost;
> @@ -218,6 +244,7 @@ struct tune_params
>    } autoprefetcher_model;
>  
>    unsigned int extra_tuning_flags;
> +  unsigned int approx_rsqrt_modes;

As we're going to add a few of these, lets follow the approach for some
of the other costs (e.g. branch costs, vector costs) and bury them in a
structure of their own.

>  };
>  
>  #define AARCH64_FUSION_PAIR(x, name) \
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 7e45a0c..048c2a3 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -29,5 +29,3 @@
>       AARCH64_TUNE_ to give an enum name. */
>  
>  AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
> -AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
> -

Did you want to add another way to tune these by command line (not
neccessary now, but as a follow-up)? See how instruction fusion is
handled by the -moverride code for an example.

Thanks,
James

Reply via email to