Hi, On Thu, 25 Jun 2015, Benedikt Huber wrote:
> > This is NOT a win on thunderX at least for single precision because > > you have to do the divide and sqrt in the same time as it takes 5 > > multiples (estimate and step are multiplies in the thunderX pipeline). > > Doubles is 10 multiplies which is just the same as what the patch does > > (but it is really slightly less than 10, I rounded up). So in the end > > this is NOT a win at all for thunderX unless we do one less step for > > both single and double. > > Yes, the expected benefit from rsqrt estimation is implementation > specific. If one has a better initial rsqrte or an application that can > trade precision for execution time, we could offer a command line option > to do only 2 steps for doulbe and 1 step for float; similar to > -mrecip-precision for PowerPC. What are your thoughts on that? On x86-64, under -ffast-math we only do one NR step. Generally the rule-of-thumb take on fast-math is, that common benchmarks should still validate with that option in effect. (And yes, I also never found a speedup for approximated reciprocals so that benchmarks would still generally validate, you always had to do two NR steps, and then it became as slow as a general divide). See also http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00099.html and the followup thread.