https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
Feng Xue <fxue at os dot amperecomputing.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |fxue at os dot
amperecomputing.com
--- Comment #16 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to GCC Commits from comment #13)
> The master branch has been updated by Alex Coplan <[email protected]>:
>
> https://gcc.gnu.org/g:3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b
>
> commit r15-3586-g3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b
> Author: Alex Coplan <[email protected]>
> Date: Fri Aug 2 09:56:07 2024 +0100
>
> libstdc++: Restore unrolling in std::find using pragma [PR116140]
>
> Together with the preparatory compiler patches, this patch restores
> unrolling in std::__find_if, but this time relying on the compiler to do
> it by using:
>
> #pragma GCC unroll 4
>
> which should restore the majority of the regression relative to the
> hand-unrolled version while still being vectorizable with WIP alignment
> peeling enhancements.
>
> On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from
> SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost
> performance).
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/116140
> * include/bits/stl_algobase.h (std::__find_if): Add #pragma to
> request GCC to unroll the loop.
If we specify "-funroll-loops" when building xalancbmk, we could get higher
performance. And this commit would cause regression, because unrolling count is
hard coded by pragma, otherwise unroller would choose a larger one, such as 8.
The regression is found on both x86 and aarch64.