https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
Feng Xue <fxue at os dot amperecomputing.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fxue at os dot amperecomputing.com --- Comment #16 from Feng Xue <fxue at os dot amperecomputing.com> --- (In reply to GCC Commits from comment #13) > The master branch has been updated by Alex Coplan <acop...@gcc.gnu.org>: > > https://gcc.gnu.org/g:3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b > > commit r15-3586-g3fd07d4f04f43816a038daf9b16c6d5bf2e96c9b > Author: Alex Coplan <alex.cop...@arm.com> > Date: Fri Aug 2 09:56:07 2024 +0100 > > libstdc++: Restore unrolling in std::find using pragma [PR116140] > > Together with the preparatory compiler patches, this patch restores > unrolling in std::__find_if, but this time relying on the compiler to do > it by using: > > #pragma GCC unroll 4 > > which should restore the majority of the regression relative to the > hand-unrolled version while still being vectorizable with WIP alignment > peeling enhancements. > > On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from > SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost > performance). > > libstdc++-v3/ChangeLog: > > PR libstdc++/116140 > * include/bits/stl_algobase.h (std::__find_if): Add #pragma to > request GCC to unroll the loop. If we specify "-funroll-loops" when building xalancbmk, we could get higher performance. And this commit would cause regression, because unrolling count is hard coded by pragma, otherwise unroller would choose a larger one, such as 8. The regression is found on both x86 and aarch64.