https://bugs.llvm.org/show_bug.cgi?id=44547

            Bug ID: 44547
           Summary: Inefficient codegen for remainder loop when
                    vectorizing by factor of 2 (possibly more)
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedb...@nondot.org
          Reporter: d.malju...@yandex.ru
                CC: llvm-bugs@lists.llvm.org

See motivating example: https://godbolt.org/z/vSfTT9

void test(const int16_t* __restrict a, const int16_t* __restrict b, int16_t*
__restrict c, uint32_t n) {
#pragma nounroll
#pragma clang loop vectorize_width(2) interleave_count(1)
    for (int32_t i = 0; i < n; i++) {
        *c++ = *a++ + *b++;
    }
}

One would imagine that the compiler would essentially turn this into
{
    if (n & 1) *c++ = *a++ + *b++;
    for (int32_t i = 0; i < n<<1; i++) {
        ...
    }
}
But it generates something like this instead:
{
    for (int32_t i = 0; i < n<<1; i++) {
        ...
    }
    if (n & 1) for (int32_t i = 0; i < phi(1, (n & 1)); i++) *c++ = *a++ +
*b++;
}
Loop vectorizer seems to always generate remainder "block", as a loop even if
it has known constant tripcount (in this case it's 1!).
However, since this tripcount is hidden behind an "if" (or switch condition
after some opts), this "remainder loop with tripcount == 1" is never really
unrolled (because SCEV fails to compute it's tripcount). Sometimes running
additional GVN and IPSCCP passes helps, but this is not optimal.

I don't see why remainder has to be a loop in the first place (when trip count
is known).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to