https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95906

            Bug ID: 95906
           Summary: Failure to recognize max pattern with mask
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int8_t v16i8 __attribute__((__vector_size__ (16)));

v16i8 f(v16i8 a, v16i8 b)
{
    v16i8 cmp = (a > b);
    return (cmp & a) | (~cmp & b);
}

int f2(int a, int b)
{
    int cmp = -(a > b);
    return (cmp & a) | (~cmp & b);
}

f can be optimized to `__builtin_ia32_pmaxsb128` (on x86 with `-msse4`) (the
`pmax` instructions can be used for the same pattern with similar types) and
`f2` can be optimized to using `MAX_EXPR` (they're essentially the same but
I've included the pattern for vectorized types because I originally found this
in a function (which was made before SSE4) made for SSE). LLVM does these
transformations, but GCC does not.

Reply via email to