在 2025/1/9 下午3:33, Richard Biener 写道:
On Thu, 9 Jan 2025, Zhou Zhao wrote:
在 2025/1/8 下午6:30, Richard Biener 写道:
On Wed, 8 Jan 2025, Zhou Zhao wrote:
在 2025/1/8 下午5:04, Richard Biener 写道:
On Wed, 8 Jan 2025, Zhou Zhao wrote:
在 2025/1/7 下午10:45, Richard Biener 写道:
On Thu, 2 Jan 2025, 赵洲 wrote:
Add Reviewer Richard Biener.
-----原始邮件-----
发件人: "Zhou Zhao" <zhaoz...@loongson.cn>
发送时间:2025-01-02 19:37:07 (星期四)
收件人: gcc-patches@gcc.gnu.org
抄送: xry...@xry111.site, i...@xen0n.name, chengl...@loongson.cn,
xucheng...@loongson.cn, zhaoz...@loongson.cn
主题: [PATCH] match.pd: Fold pattern of round semantics.
This patch implements 4 rules for semantics of round func in match.pd
under
-funsafe-math-optimizations:
1) (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) -> floor(x+0.5)
2) (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) -> floor(x+0.5)
3) (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) -> floor(x+0.5)
4) (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) -> floor(x+0.5)
The patch implements floor(x+0.5) operation to replace semantics of
round(x) function.
The patch was regtested on aarch64-linux-gnu and x86_64-linux-gnu,
SPEC
2017 and SPEC 2006 were run:
As for SPEC 2017, 538.imagick_r benchmark performance increased by 3%+
in base test of ratio mode.
As for SPEC 2006, while the transform does not seem to be triggered,
we
also see no non-noise impact on performance.
OK for mainline?
gcc/ChangeLog:
* match.pd: Add new pattern for round.
gcc/testsuite/ChangeLog:
* gcc.dg/fold-round-1.c: New test.
---
gcc/match.pd | 27 ++++++++++++++
gcc/testsuite/gcc.dg/fold-round-1.c | 56
+++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/fold-round-1.c
diff --git a/gcc/match.pd b/gcc/match.pd
index 83eca8b2e0a..7b22b7913ac 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -777,6 +777,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(rdiv @0 (negate @1))
(rdiv (negate @0) @1))
+(if (flag_unsafe_math_optimizations)
+/* convert semantics of round(x) function to floor(x+0.5) */
+/* (x-floor(x)) < (ceil(x)-x) ? floor(x) : ceil(x) --> floor(x+0.5)
*/
+/* (x-floor(x)) >= (ceil(x)-x) ? ceil(x) : floor(x) --> floor(x+0.5)
*/
+/* (ceil(x)-x) > (x-floor(x)) ? floor(x) : ceil(x) --> floor(x+0.5)
*/
+/* (ceil(x)-x) <= (x-floor(x)) ? ceil(x) : floor(x) --> floor(x+0.5)
*/
Hi, Richard
Thanks for your reply. This patch is not fixes a bug. If we can discuss
and reach a conclusion now, I will be re-posted/pinged when stage1 opens
for GCC 16.
Zhou Zhao.
I think you should be able to merge the cases for swapped compares
by just matching (cond (op:c (...)).
If use swapped compares to merge the cases, the expression
((x - floor(x)) < (ceil(x) - x) ? floor(x) : ceil(x)) can be transformed
to
((ceil(x) - x) < (x - floor(x)) ? floor(x) : ceil(x)), this is not the
expression
I want, not be simplified to floor(x + 0.5). For example, when x = 1.4
the result of new expression is ceil(1.4)=2.0, but floor(1.4+0.5)=1.0.
So I did not use swapped compares to merge the cases.
The swapped compare will match
(ceil(x) - x) > (x - floor(x) ? floor(x) : ceil(x)
Ah, you are correct. I try use the swapped compares merge the cases, it's
consistent with what you said. I misunderstood the swapped comparison
symbols.
I do wonder about the compares itself though, (x-floor(x)) < (ceil(x)-x)
isn't that true exactly when x > 0? With flag_unsafe_optimizations
< vs. <= and thus x not having a fractional component would be
unimportant. That said, wouldn't we expect the compares to be
simplified?
I agree your said, this expression can be simplified when x not haveing a
fractional component, but it's difficult to determine whether x having a
fractional component. Maybe constrain x is int type and matching
(floor (float@1 @0)) can simplify the expression. But that's not my
purpose,
maybe the commits is not clear enough, I would rephrase it.
We only need to focus on the first pattern,the other three are different
forms of it. I will use "left distance" to represent the distance from x
to its left integer boundary, "right distance" to represent the distance
from x to its right integer boundary.
The pattern semantics as follows:
as for x, if left distance is less than right distance, return floor(x),
if left distance is greater or equal than right distance, return ceil(x).
For example, when x=1.4, the left distance is less than right distance,
return floor(x). (x+0.5) has not crossed the right boundary. floor(x+0.5)
is equal to floor(x). when x=1.5 or x=1.6, the left distance is not less
than right distance, return ceil(x). (x+0.5) has crossed the right
boundary. floor(x+0.5) is equal to ceil(x).
I see. So how does it relate to rint/nearbyint?
rint funciton is round halfway cases to the even integer. like
rint(0.5)=0.0,
but this pattern of we mentioned is result 1.0.
But you remove one rounding step on the compares and add one rounding on
the replacement ( + 0.5) which is why you need to guard this with
flag_unsafe_math_optimizations, so arguing the == x.5 case is different
is futile? Or do you say you actually do not need to guard with
flag_unsafe_math_optimizations because the behavior is exactly the
same for all double values? (there's also Inf/Nan and signed zero to
consider, but for those it's better to check HONOR_NANS/INFS/SIGNED_ZEROS)
Seems I dropped the list in my last reply, adding back.
As you say, I need add the flag_unsafe_math_optimizations option to
protect the (+0.5) operation. with the -Ofast option,enable
-fno-signed-zeros, I observed when x=-0.4, the pattern would return
-0.0 on aarch64-linux-gnu, return 0.0 on x86_64-linux-gnu. floor(x+0.5)
will be return 0.0 on above all target.
Well, with -fno-signed-zeros -0.0 and 0.0 are considered the same.
In addition, the pattern is the same for all double values, include
INFS/NANS. Do you have any advice? Sincerely asking for your advice.
Not really - so what you say is that you believe the transform is
an identity transform when following all IEEE rules on the original
expression besides the extra rounding step done for the + 0.5 operation.
It's correct to protect the rounding change with
-funsafe-math-optimizations (we don't have any better flag for this).
The rounding might cause a difference for values where there's only
a single bit for the fractional part of the number, so + 0.5 gets
you always the next or the same whole number depending on the active
rounding mode. But I'm not really a IEEE expert.
Richard.
Thank you for your explanation. Indeed, I cannot guarantee that (+0.5)
will always get the expected value, so I use -funsafe-math-optimizations
to protect. I think it's better than check the HONOR_NANS/INFS/SIGNED_ZEROS.
If you have no objections to adding this flag, I will resubmit PATCH v2
when stage 1 opens for GCC 16.
Zhou Zhao.
I'm also curious where you got these expressions to be simplified from?
In the 538.imagick_r benchmark of Spec2017, I find this pattern from
MagickRound function. By simplifying the pattern, the performance of the
538 benchmark can be improved by 3% to 4% on aarch64-linux-gnu and
x86_64-linux-gnu.
Ah, OK, can you mention this in the commit message?
Thanks,
Richard.
I will resubmit the [PATH v2] include the above mentioned commit message,
and merge the cases for swapped compares by just matching (cond (op:c
(...)).
Thanks,
Zhou Zhao.
Note we're in stage3 so unless this fixes a bug it isn't appropriate
at this stage and should be re-posted/pinged when stage1 opens for
GCC 16.
Richard.
+(for op (lt ge)
+ bt (FLOOR CEIL)
+ bf (CEIL FLOOR)
+ floor (FLOOR FLOOR)
+ ceil (CEIL CEIL)
+ (simplify
+ (cond (op (minus:s SSA_NAME@0 (floor SSA_NAME@0))
+ (minus:s (ceil SSA_NAME@0) SSA_NAME@0))
+ (bt SSA_NAME@0) (bf SSA_NAME@0))
+ (floor (plus @0 { build_real (type, dconsthalf); }))))
+(for op (gt le)
+ bt (FLOOR CEIL)
+ bf (CEIL FLOOR)
+ floor (FLOOR FLOOR)
+ ceil (CEIL CEIL)
+ (simplify
+ (cond (op (minus:s (ceil SSA_NAME@0) SSA_NAME@0)
+ (minus:s SSA_NAME@0 (floor SSA_NAME@0)))
+ (bt SSA_NAME@0) (bf SSA_NAME@0))
+ (floor (plus @0 { build_real (type, dconsthalf); })))))
+
(if (flag_unsafe_math_optimizations)
/* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
Since C / x may underflow to zero, do this only for unsafe
math.
*/
diff --git a/gcc/testsuite/gcc.dg/fold-round-1.c
b/gcc/testsuite/gcc.dg/fold-round-1.c
new file mode 100644
index 00000000000..845d6d2e475
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-round-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -funsafe-math-optimizations" } */
+
+extern void link_error (void);
+
+#define TEST_ROUND(TYPE, FFLOOR, FCEIL)
\
+ void round_##FFLOOR##_1 (TYPE x)
\
+ {
\
+ TYPE t1 = 0;
\
+ TYPE t2 = __builtin_##FFLOOR (x + 0.5);
\
+ if ((x - __builtin_##FFLOOR (x)) < (__builtin_##FCEIL (x) - x))
\
+ t1 = __builtin_##FFLOOR (x);
\
+ else
\
+ t1 = __builtin_##FCEIL (x);
\
+ if (t1 != t2)
\
+ link_error ();
\
+ }
\
+ void round_##FFLOOR##_2 (TYPE x)
\
+ {
\
+ TYPE t1 = 0;
\
+ TYPE t2 = __builtin_##FFLOOR (x + 0.5);
\
+ if ((__builtin_##FCEIL (x) - x) > (x - __builtin_##FFLOOR (x)))
\
+ t1 = __builtin_##FFLOOR (x);
\
+ else
\
+ t1 = __builtin_##FCEIL (x);
\
+ if (t1 != t2)
\
+ link_error ();
\
+ }
\
+ void round_##FFLOOR##_3 (TYPE x)
\
+ {
\
+ TYPE t1 = 0;
\
+ TYPE t2 = __builtin_##FFLOOR (x + 0.5);
\
+ if ((__builtin_##FCEIL (x) - x) <= (x - __builtin_##FFLOOR (x)))
\
+ t1 = __builtin_##FCEIL (x);
\
+ else
\
+ t1 = __builtin_##FFLOOR (x);
\
+ if (t1 != t2)
\
+ link_error ();
\
+ }
\
+ void round_##FFLOOR##_4 (TYPE x)
\
+ {
\
+ TYPE t1 = 0;
\
+ TYPE t2 = __builtin_##FFLOOR (x + 0.5);
\
+ if ((x - __builtin_##FFLOOR (x)) >= (__builtin_##FCEIL (x) - x))
\
+ t1 = __builtin_##FCEIL (x);
\
+ else
\
+ t1 = __builtin_##FFLOOR (x);
\
+ if (t1 != t2)
\
+ link_error ();
\
+ }
+
+TEST_ROUND (float, floorf, ceilf)
+TEST_ROUND (double, floor, ceil)
+TEST_ROUND (long double, floorl, ceill)
+
+/* { dg-final { scan-assembler-not "link_error" } } */
--
2.20.1