[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

acoplan at gcc dot gnu.org via Gcc-bugs Mon, 05 Aug 2024 01:37:44 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140


--- Comment #7 from Alex Coplan <acoplan at gcc dot gnu.org> ---
So it turns out the reason #pragma GCC unroll doesn't work under LTO is because
we don't propagate the `has_unroll` flag when streaming functions during LTO,
so RTL loop2_unroll ends up not running at all.

The following patch allows us to recover it:

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..93877065d86 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1136,6 +1136,8 @@ input_cfg (class lto_input_block *ib, class data_in
*data_in,
       /* Read OMP SIMD related info.  */
       loop->safelen = streamer_read_hwi (ib);
       loop->unroll = streamer_read_hwi (ib);
+      if (loop->unroll > 1)
+       fn->has_unroll = true;
       loop->owned_clique = streamer_read_hwi (ib);
       loop->dont_vectorize = streamer_read_hwi (ib);
       loop->force_vectorize = streamer_read_hwi (ib);

a more conservative fix might be to explicitly stream has_unroll out and in
again, but the above is simpler and I don't currently see a reason why we can't
infer it like this (comments welcome).

Anyway, this (together with the above C++ patch and adding the #pragma to
std::__find_if) gives us back ~3.9% on Neoverse V1.  That recovers about 71% of
the regression, leaving the effective regression (relative to the hand-unrolled
code) at 1.7% instead of 5.8%.

It's possible there are further improvements to be had by tweaking the unrolled
codegen or making inlining heuristics take #pragma GCC unroll into account
(assuming they don't currently, I haven't checked).  I'll try to do some more
analysis on the remaining difference.

In any case, I'll aim to polish and submit these patches unless there are any
objections at this point.

[Bug libstdc++/116140] [15 Regression] 5-35% slowdown of 483.xalancbmk and 523.xalancbmk_r since r15-2356-ge69456ff9a54ba

Reply via email to