On Mon, Feb 08, 2016 at 10:57:44AM +0000, James Greenhalgh wrote: > On Mon, Feb 01, 2016 at 01:59:34PM +0000, James Greenhalgh wrote: > > On Mon, Jan 25, 2016 at 11:21:25AM +0000, James Greenhalgh wrote: > > > On Mon, Jan 11, 2016 at 11:53:39AM +0000, James Greenhalgh wrote: > > > > > > > > Hi, > > > > > > > > I'd like to switch the logic around in aarch64.c such that > > > > -mlow-precision-recip-sqrt causes us to always emit the low-precision > > > > software expansion for reciprocal square root. I have two reasons to do > > > > this; first is consistency across -mcpu targets, second is enabling more > > > > -mcpu targets to use the flag for peak tuning. > > > > > > > > I don't much like that the precision we use for > > > > -mlow-precision-recip-sqrt > > > > differs between cores (and possibly compiler revisions). Yes, we're > > > > under -ffast-math but I take this flag to mean the user explicitly > > > > wants the > > > > low-precision expansion, and we should not diverge from that based on an > > > > internal decision as to what is optimal for performance in the > > > > high-precision case. I'd prefer to keep things as predictable as > > > > possible, > > > > and here that means always emitting the low-precision expansion when > > > > asked. > > > > > > > > Judging by the comments in the thread proposing the reciprocal square > > > > root optimisation, this will benefit all cores currently supported by > > > > GCC. > > > > To be clear, we would still not expand in the high-precision case for > > > > any > > > > cores which do not explicitly ask for it. Currently that is Cortex-A57 > > > > and xgene, though I will be proposing a patch to remove Cortex-A57 from > > > > that list shortly. > > > > > > > > Which gives my second motivation for this patch. > > > > -mlow-precision-recip-sqrt > > > > is intended as a tuning flag for situations where performance is more > > > > important than precision, but the current logic requires setting an > > > > internal flag which also changes the performance characteristics where > > > > high-precision is needed. This conflates two decisions the target might > > > > want to make, and reduces the applicability of an option targets might > > > > want to enable for performance. In particular, I'd still like to see > > > > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision > > > > sequence for floats under Cortex-A57. > > > > > > > > Based on that reasoning, this patch makes the appropriate change to the > > > > logic. I've checked with the current -mcpu values to ensure that > > > > behaviour > > > > without -mlow-precision-recip-sqrt does not change, and that behaviour > > > > with -mlow-precision-recip-sqrt is to emit the low precision sequences. > > > > > > > > I've also put this through bootstrap and test on aarch64-none-linux-gnu > > > > with no issues. > > > > > > > > OK? > > > > > > *Ping* > > > > *Pingx2* > > *Ping^3*
*ping^4* Thanks, James > > > > 2015-12-10 James Greenhalgh <james.greenha...@arm.com> > > > > > > > > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software > > > > reciprocal sqrt for -mlow-precision-recip-sqrt. > > > > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > > index 9142ac0..1d5d898 100644 > > > > --- a/gcc/config/aarch64/aarch64.c > > > > +++ b/gcc/config/aarch64/aarch64.c > > > > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void) > > > > { > > > > return (!flag_trapping_math > > > > && flag_unsafe_math_optimizations > > > > - && (aarch64_tune_params.extra_tuning_flags > > > > - & AARCH64_EXTRA_TUNE_RECIP_SQRT)); > > > > + && ((aarch64_tune_params.extra_tuning_flags > > > > + & AARCH64_EXTRA_TUNE_RECIP_SQRT) > > > > + || flag_mrecip_low_precision_sqrt)); > > > > } > > > > > > > > /* Function to decide when to use > > > > > >