https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89653
Bug ID: 89653 Summary: Missing vectorization of loop containing std::min/std::max and temporary Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: moritz.kreutzer at siemens dot com Target Milestone: --- Godbolt worksheet: https://godbolt.org/z/F6m5hl GCC (trunk and all earlier versions) fails to vectorize (SSE/AVX2/AVX-512) the following loop because of a "complicated access pattern" (similarly for std::max()): == loop1 - FAIL ==================================== for (int i = 0; i < end; ++i) { vec[i] = std::min(vec[i], vec[i]/x); } ==================================================== If we don't use std::min(), but implement the same loop using a ternary operator, the vectorization is successful: == loop2 - OK ====================================== for (int i = 0; i < end; ++i) { vec[i] = vec[i] < vec[i]/x ? vec[i] : vec[i]/x; } ==================================================== However, the problem does not seem to be that GCC is unable to vectorize std::min() itself, because the following loop _does_ get vectorized (note the different logic and the absence of an implicit temporary for vec[i]/x): == loop3 - OK ====================================== for (int i = 0; i < end; ++i) { vec[i] = std::min(vec[i], x); } ==================================================== The C++ standard prescribes that std::min() returns the result as a const reference, so an implementation might look like this: == std::min() ====================================== double const & min(double const &a, double const &b) { if (a<b) return a; return b; } ==================================================== Implementing our own min() method along the same lines, but returning the result per value, also enables vectorization of the original loop (see loop4 @ godbolt). All in all, it seems like mixing the return-by-reference with the implicitly created temporary for the second argument in loop1 is the problem here. So, it might not be an issue in GCC's vectorizer component, but rather in the access analysis. While I can imagine that the present semantics (mixture of temporary values and references) are not trivial for access analysis, I have some doubts as to whether it's impossible to safely vectorize loop1. However, it may well be the case that I'm missing something obvious here, so any help is greatly appreciated. Moritz