https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140

Feng Xue <fxue at os dot amperecomputing.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fxue at os dot 
amperecomputing.com

--- Comment #16 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to GCC Commits from comment #13)
> The master branch has been updated by Alex Coplan <acop...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b
> 
> commit r15-3586-g3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b
> Author: Alex Coplan <alex.cop...@arm.com>
> Date:   Fri Aug 2 09:56:07 2024 +0100
> 
>     libstdc++: Restore unrolling in std::find using pragma [PR116140]
>     
>     Together with the preparatory compiler patches, this patch restores
>     unrolling in std::__find_if, but this time relying on the compiler to do
>     it by using:
>     
>       #pragma GCC unroll 4
>     
>     which should restore the majority of the regression relative to the
>     hand-unrolled version while still being vectorizable with WIP alignment
>     peeling enhancements.
>     
>     On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from
>     SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost
>     performance).
>     
>     libstdc++-v3/ChangeLog:
>     
>             PR libstdc++/116140
>             * include/bits/stl_algobase.h (std::__find_if): Add #pragma to
>             request GCC to unroll the loop.

If we specify "-funroll-loops" when building xalancbmk, we could get higher
performance. And this commit would cause regression, because unrolling count is
hard coded by pragma, otherwise unroller would choose a larger one, such as 8.
The regression is found on both x86 and aarch64.

Reply via email to