> > Ah, a runtime test. That'd be sufficient. The cost when we can't do > the transformation is relatively small, but the gains when we can are huge.
Thank you. I will update the patch and send it again :-) On Wed, Aug 22, 2018 at 7:05 PM, Jeff Law <l...@redhat.com> wrote: > On 08/22/2018 06:02 AM, Richard Biener wrote: >> On Tue, Aug 21, 2018 at 11:27 PM Jeff Law <l...@redhat.com> wrote: >>> >>> On 08/21/2018 02:08 PM, Giuliano Augusto Faulin Belinassi wrote: >>>>> Just as an example, compare the results for >>>>> x = 0x1.fffffffffffffp1023 >>>> >>>> Thank you for your answer and the counterexample. :-) >>>> >>>>> If we had useful range info on floats we might conditionalize such >>>>> transforms appropriately. Or we can enable it on floats and do >>>>> the sqrt (x*x + 1) in double. >>>> >>>> I think I managed to find a bound were the transformation can be done >>>> without overflow harm, however I don't know about rounding problems, >>>> however >>>> >>>> Suppose we are handling double precision floats for now. The function >>>> x/sqrt(1 + x*x) approaches 1 when x is big enough. How big must be x >>>> for the function be 1? >>>> >>>> Since sqrt(1 + x*x) > x when x > 1, then we must find a value to x >>>> that x/sqrt(1 + x*x) < eps, where eps is the biggest double smaller >>>> than 1. Such eps must be around 1 - 2^-53 in ieee double because the >>>> mantissa has 52 bits. Solving for x yields that x must be somewhat >>>> bigger than 6.7e7, so let's take 1e8. Therefore if abs(x) > 1e8, it is >>>> enough to return copysign(1, x). Notice that this arguments is also >>>> valid for x = +-inf (if target supports that) because sin(atan(+-inf)) >>>> = +-1, and it can be extended to other floating point formats.The >>>> following test code illustrates my point: >>>> https://pastebin.com/M4G4neLQ >>>> >>>> This might still be faster than calculating sin(atan(x)) explicitly. >>>> >>>> Please let me know if this is unfeasible. :-) >>> The problem is our VRP implementation doesn't handle any floating point >>> types at this time. If we had range information for FP types, then >>> this kind of analysis is precisely what we'd need to do the >>> transformation regardless of -ffast-math. >> >> I think his idea was to emit a runtime test? You'd have to use a >> COND_EXPR and evaluate both arms at the same time because >> match.pd doesn't allow you to create control flow. >> >> Note the rounding issue is also real given for large x you strip >> away lower mantissa bits when computing x*x. > Ah, a runtime test. That'd be sufficient. The cost when we can't do > the transformation is relatively small, but the gains when we can are huge. > > Jeff