https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89653

            Bug ID: 89653
           Summary: Missing vectorization of loop containing
                    std::min/std::max and temporary
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: moritz.kreutzer at siemens dot com
  Target Milestone: ---

Godbolt worksheet: https://godbolt.org/z/F6m5hl

GCC (trunk and all earlier versions) fails to vectorize (SSE/AVX2/AVX-512) the
following loop because of a "complicated access pattern" (similarly for
std::max()):

== loop1 - FAIL ====================================
for (int i = 0; i < end; ++i)
{
  vec[i] = std::min(vec[i], vec[i]/x);
}
====================================================

If we don't use std::min(), but implement the same loop using a ternary
operator, the vectorization is successful:

== loop2 - OK ======================================
for (int i = 0; i < end; ++i)
{
  vec[i] = vec[i] < vec[i]/x ? vec[i] : vec[i]/x;
}
====================================================

However, the problem does not seem to be that GCC is unable to vectorize
std::min() itself, because the following loop _does_ get vectorized (note the
different logic and the absence of an implicit temporary for vec[i]/x):

== loop3 - OK ======================================
for (int i = 0; i < end; ++i)
{
  vec[i] = std::min(vec[i], x);
}
====================================================

The C++ standard prescribes that std::min() returns the result as a const
reference, so an implementation might look like this:

== std::min() ======================================
double const & min(double const &a, double const &b)
{
    if (a<b) return a;
    return b;
}
====================================================

Implementing our own min() method along the same lines, but returning the
result per value, also enables vectorization of the original loop (see loop4 @
godbolt). 


All in all, it seems like mixing the return-by-reference with the implicitly
created temporary for the second argument in loop1 is the problem here. So, it
might not be an issue in GCC's vectorizer component, but rather in the access
analysis. While I can imagine that the present semantics (mixture of temporary
values and references) are not trivial for access analysis, I have some doubts
as to whether it's impossible to safely vectorize loop1. However, it may well
be the case that I'm missing something obvious here, so any help is greatly
appreciated.


Moritz

Reply via email to