https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110711
Bug ID: 110711 Summary: possible missed optimization for std::max with -march=znver2 Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: mrks2023 at proton dot me Target Milestone: --- I think I found a missed optimization involving std::max() for -march=znver2 (sorry if it was already reported, but I didn't find anything related in the bug tracker). I have two functions that compute the maximum element of an array: - function k_std_max uses std::max() and is never vectorized - function k_max uses conditional assignment and is vectorized, when the optimization flags allow for it The code (also https://godbolt.org/z/hW49nbqMY): #include <cassert> #include <algorithm> double k_std_max(size_t n_els, double * a) { assert(n_els > 0); double m = a[0]; #ifdef _OPENMP #pragma omp simd reduction(max:m) #endif for (size_t i = 1; i < n_els; ++i) { m = std::max(m, a[i]); } return m; } double k_max(size_t n_els, double * a) { assert(n_els > 0); double m = a[0]; #ifdef _OPENMP #pragma omp simd reduction(max:m) #endif for (size_t i = 1; i < n_els; ++i) { m = m < a[i] ? a[i] : m; } return m; } Compiling with "-O3 -fopenmp -march=znver2 -Wall -Wextra -DNDEBUG" vectorizes k_max: .L19: vmovupd ymm3, YMMWORD PTR [rax+8] add rax, 32 vmaxpd ymm1, ymm3, ymm1 cmp rax, rdx jne .L19 but for k_std_max still scalar instructions are used: .L3: vmovsd xmm0, QWORD PTR [rax] add rax, 8 vmaxsd xmm0, xmm0, xmm1 cmp rdx, rax jne .L5 Note that I had to use -fopenmp as using only -fopenmp-simd did not vectorize k_max. Even when I use "-Ofast" or "-Ofast -fopenmp" instead of "-O3" k_std_max is not vectorized: .L3: vmaxsd xmm0, xmm0, QWORD PTR [rax] add rax, 8 cmp rdx, rax jne .L3