Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

Michael Matz Thu, 25 Jun 2015 06:28:05 -0700

Hi,

On Thu, 25 Jun 2015, Benedikt Huber wrote:


> > This is NOT a win on thunderX at least for single precision because 
> > you have to do the divide and sqrt in the same time as it takes 5 
> > multiples (estimate and step are multiplies in the thunderX pipeline).  
> > Doubles is 10 multiplies which is just the same as what the patch does 
> > (but it is really slightly less than 10, I rounded up). So in the end 
> > this is NOT a win at all for thunderX unless we do one less step for 
> > both single and double.
> 
> Yes, the expected benefit from rsqrt estimation is implementation 
> specific. If one has a better initial rsqrte or an application that can 
> trade precision for execution time, we could offer a command line option 
> to do only 2 steps for doulbe and 1 step for float; similar to 
> -mrecip-precision for PowerPC. What are your thoughts on that?

On x86-64, under -ffast-math we only do one NR step.  Generally the 
rule-of-thumb take on fast-math is, that common benchmarks should still 
validate with that option in effect.

(And yes, I also never found a speedup for approximated reciprocals so 
that benchmarks would still generally validate, you always had to do two 
NR steps, and then it became as slow as a general divide).  See also 
http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00099.html and the followup 
thread.

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

Reply via email to