On Tue, May 07, 2019 at 01:50:23PM +0200, Marc Glisse wrote: > > And actually it seems that we could optimize the plus1 == plus2 cases > > even if HONOR_SIGN_DEPENDENT_ROUNDING (type), because even in fesetenv > > (FE_DOWNWARD) mode the testcase prints the first two (in all other modes all > > 4). > > It is very hard to judge what is ok with -frounding-math, because that mode > is already unusably broken (I use a pass-through asm volatile to protect the > arguments and result of every operation instead). One important aspect of > the optimization is whether both operations use the same rounding mode, or > if there may be a call to fesetround in between. Probably we shouldn't care > about -frounding-mode, since anyway it is likely that it will use some > IFN_FANCY_PLUS instead of PLUS_EXPR if it is ever implemented.
I haven't thought about t = x + 0.0; fesetround (...); y = t + 0.0; indeed, let's take -frounding-math out of the patch now. If we improve that mode, such as through explicit dependencies on the floating point state in the IL, we can get back to this case too. > > + (inner_op @0 @1)))))))) > > Shouldn't you give it a name in the source pattern and return that, instead > of creating a new statement? Or are you doing the operation a second time on Good idea. > purpose in case the rounding mode changed or to force an exception? > > > + (outer_op @0 @2) > > With sNaN, this may raise a second exception where we used to have only > qNaN+0, no? And the handling of exceptions may have changed in between, etc. IEEE 754 I believe says that for x non-zero x + (+/-0.0) = x and the only exception raised could be invalid exception if x is sNaN or the Intel denormal operand exception (I think we generally don't care about that one) and nothing else (there should be no overflow nor underflow nor inexact and obviously no division by zero). If the invalid exception is masked off, then I believe one can't distinguish between the x + 0.0 and (x + 0.0) + 0.0 computations, already x + 0.0 will raise IE and turn the sNaN into qNaN and the optional second + 0.0 will just keep that to be a qNaN without further exceptions, unless there is some library call in between which queries the accumulated exceptions, clears it etc. I believe handling that case right is only possible if we make those dependencies in the IL explicit and under non-default flags. In any case, I don't see a difference between the @3 case where we keep the inner op and the case where we keep the outer op but remove the inner op. Both behave the same. Here is an updated patch with your @3 idea and taking out -frounding-math stuff. 2019-05-07 Jakub Jelinek <ja...@redhat.com> PR tree-optimization/90356 * match.pd ((X +/- 0.0) +/- 0.0): Optimize into X +/- 0.0 if possible. * gcc.dg/tree-ssa/pr90356-1.c: New test. * gcc.dg/tree-ssa/pr90356-2.c: New test. * gcc.dg/tree-ssa/pr90356-3.c: New test. * gcc.dg/tree-ssa/pr90356-4.c: New test. --- gcc/match.pd.jj 2019-05-07 13:56:53.062954181 +0200 +++ gcc/match.pd 2019-05-07 14:30:36.010474285 +0200 @@ -152,6 +152,28 @@ (define_operator_list COND_TERNARY (if (fold_real_zero_addition_p (type, @1, 1)) (non_lvalue @0))) +/* Even if the fold_real_zero_addition_p can't simplify X + 0.0 + into X, we can optimize (X + 0.0) + 0.0 or (X + 0.0) - 0.0 + or (X - 0.0) + 0.0 into X + 0.0 and (X - 0.0) - 0.0 into X - 0.0 + if not -frounding-math. For sNaNs the first operation would raise + exceptions but turn the result into qNan, so the second operation + would not raise it. */ +(for inner_op (plus minus) + (for outer_op (plus minus) + (simplify + (outer_op (inner_op@3 @0 REAL_CST@1) REAL_CST@2) + (if (real_zerop (@1) + && real_zerop (@2) + && !HONOR_SIGN_DEPENDENT_ROUNDING (type)) + (with { bool inner_plus = ((inner_op == PLUS_EXPR) + ^ REAL_VALUE_MINUS_ZERO (TREE_REAL_CST (@1))); + bool outer_plus + = ((outer_op == PLUS_EXPR) + ^ REAL_VALUE_MINUS_ZERO (TREE_REAL_CST (@2))); } + (if (outer_plus && !inner_plus) + (outer_op @0 @2) + @3)))))) + /* Simplify x - x. This is unsafe for certain floats even in non-IEEE formats. In IEEE, it is unsafe because it does wrong for NaNs. --- gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c.jj 2019-05-07 14:27:17.912654939 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr90356-1.c 2019-05-07 14:27:17.912654939 +0200 @@ -0,0 +1,23 @@ +/* PR tree-optimization/90356 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-rounding-math -fsignaling-nans -fsigned-zeros -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "x_\[0-9]*.D. \\+ 0.0;" 12 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. - 0.0;" 4 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 16 "optimized" } } */ + +double f1 (double x) { return (x + 0.0) + 0.0; } +double f2 (double y) { return (y + (-0.0)) + (-0.0); } +double f3 (double y) { return (y - 0.0) - 0.0; } +double f4 (double x) { return (x - (-0.0)) - (-0.0); } +double f5 (double x) { return (x + 0.0) - 0.0; } +double f6 (double x) { return (x + (-0.0)) - (-0.0); } +double f7 (double x) { return (x - 0.0) + 0.0; } +double f8 (double x) { return (x - (-0.0)) + (-0.0); } +double f9 (double x) { double t = x + 0.0; return t + 0.0; } +double f10 (double y) { double t = y + (-0.0); return t + (-0.0); } +double f11 (double y) { double t = y - 0.0; return t - 0.0; } +double f12 (double x) { double t = x - (-0.0); return t - (-0.0); } +double f13 (double x) { double t = x + 0.0; return t - 0.0; } +double f14 (double x) { double t = x + (-0.0); return t - (-0.0); } +double f15 (double x) { double t = x - 0.0; return t + 0.0; } +double f16 (double x) { double t = x - (-0.0); return t + (-0.0); } --- gcc/testsuite/gcc.dg/tree-ssa/pr90356-2.c.jj 2019-05-07 14:27:17.912654939 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr90356-2.c 2019-05-07 14:27:17.912654939 +0200 @@ -0,0 +1,8 @@ +/* PR tree-optimization/90356 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-rounding-math -fno-signaling-nans -fsigned-zeros -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "x_\[0-9]*.D. \\+ 0.0;" 12 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "y_\[0-9]*.D. - 0.0;" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 12 "optimized" } } */ + +#include "pr90356-1.c" --- gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c.jj 2019-05-07 14:27:17.913654923 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr90356-3.c 2019-05-07 14:27:17.913654923 +0200 @@ -0,0 +1,6 @@ +/* PR tree-optimization/90356 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -frounding-math -fsignaling-nans -fsigned-zeros -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 32 "optimized" } } */ + +#include "pr90356-1.c" --- gcc/testsuite/gcc.dg/tree-ssa/pr90356-4.c.jj 2019-05-07 14:27:17.913654923 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr90356-4.c 2019-05-07 14:27:17.913654923 +0200 @@ -0,0 +1,6 @@ +/* PR tree-optimization/90356 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -frounding-math -fno-signaling-nans -fsigned-zeros -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " \[+-] 0.0;" 32 "optimized" } } */ + +#include "pr90356-1.c" Jakub