Hi Evandro,

> For example, though this approximation is improves the performance
> noticeably for DF on A57, for SF, not so much, if at all.

I'm still skeptical that you ever can get any gain on scalars. I bet the only 
gain is on
4x vectorized floats.

So what I would like to see is this implemented in a more general way. We should
be able choose whether to expand depending on the mode - including whether it is
vectorized. For example enable on V4SFmode and maybe V2DFmode, but not 
on any scalars. 

Then we'd add new CPU tuning settings for division, sqrt and rsqrt (rather than 
adding lots
of extra tune flags). Note the md file should call a function in aarch64.c to 
decide whether to
expand or not (your division approximation patch makes the decision in the md 
file which
does not seem a good idea).

Wilco

Reply via email to