On Mon, Jan 11, 2016 at 11:53:39AM +0000, James Greenhalgh wrote:
> 
> Hi,
> 
> I'd like to switch the logic around in aarch64.c such that
> -mlow-precision-recip-sqrt causes us to always emit the low-precision
> software expansion for reciprocal square root. I have two reasons to do
> this; first is consistency across -mcpu targets, second is enabling more
> -mcpu targets to use the flag for peak tuning.
> 
> I don't much like that the precision we use for -mlow-precision-recip-sqrt
> differs between cores (and possibly compiler revisions). Yes, we're
> under -ffast-math but I take this flag to mean the user explicitly wants the
> low-precision expansion, and we should not diverge from that based on an
> internal decision as to what is optimal for performance in the
> high-precision case. I'd prefer to keep things as predictable as possible,
> and here that means always emitting the low-precision expansion when asked.
> 
> Judging by the comments in the thread proposing the reciprocal square
> root optimisation, this will benefit all cores currently supported by GCC.
> To be clear, we would still not expand in the high-precision case for any
> cores which do not explicitly ask for it. Currently that is Cortex-A57
> and xgene, though I will be proposing a patch to remove Cortex-A57 from
> that list shortly.
> 
> Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> is intended as a tuning flag for situations where performance is more
> important than precision, but the current logic requires setting an
> internal flag which also changes the performance characteristics where
> high-precision is needed. This conflates two decisions the target might
> want to make, and reduces the applicability of an option targets might
> want to enable for performance. In particular, I'd still like to see
> -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> sequence for floats under Cortex-A57.
> 
> Based on that reasoning, this patch makes the appropriate change to the
> logic. I've checked with the current -mcpu values to ensure that behaviour
> without -mlow-precision-recip-sqrt does not change, and that behaviour
> with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> 
> I've also put this through bootstrap and test on aarch64-none-linux-gnu
> with no issues.
> 
> OK?

*Ping*

Thanks,
James

> 2015-12-10  James Greenhalgh  <james.greenha...@arm.com>
> 
>       * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
>       reciprocal sqrt for -mlow-precision-recip-sqrt.
> 

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9142ac0..1d5d898 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7485,8 +7485,9 @@ use_rsqrt_p (void)
>  {
>    return (!flag_trapping_math
>         && flag_unsafe_math_optimizations
> -       && (aarch64_tune_params.extra_tuning_flags
> -           & AARCH64_EXTRA_TUNE_RECIP_SQRT));
> +       && ((aarch64_tune_params.extra_tuning_flags
> +            & AARCH64_EXTRA_TUNE_RECIP_SQRT)
> +           || flag_mrecip_low_precision_sqrt));
>  }
>  
>  /* Function to decide when to use

Reply via email to