On 6/19/07, tbp <[EMAIL PROTECTED]> wrote:
Indeed there are holes in every direction when you pull in such transformation, and the cost of plugging every one of them would be prohibitive; the next batch of c2d supposedly will leave you with ~6 cycles to make it worth for a sqrt.
My C2D has 6 cycles sqrt and 6 cycles div (measured by mubench, not "from the specs"), but gas_dyn still runs 30% faster with reciprocals.
My point merely was that, considering one operation, you'd introduce NaN for a not so special value (0) which, in a *fast* math scenario, could be produced at any previous stage due to denormal clamping; with no sane way to take care of. Again, if you look at prior art (icc, AMD's manual...), that's the only special case they covered.
sqrt(0.0) = NaN is indeed a bit strange. I'll add the trick with min(x, maxval), as I think is faster than compare + pand.
Admittedly that's a trade off but not that unreasonable. Now, an option to remove such transformations from -ffast-math bag-o-tricks would be fine and would still buy gcc some Spec bragging rights :)
Due to all combinations with rsqrt and rcpss, I'm a bit nervous about including this by default into -ffast-math. OTOH, can somebody measure the impact of -mrecip in spec? Uros.