https://bugs.llvm.org/show_bug.cgi?id=42411

            Bug ID: 42411
           Summary: Suboptimal code after vectorization of not unrolled
                    loop with 8 iterations
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]

Found in PR42410.

void foo (int *__restrict arr1, int *__restrict arr2)
{
  for (int i = 0; i < 8; i++)
    arr1[i] += arr2[i];
}

Clang -O3 -march=skylake -fno-unroll-loops 

foo(int*, int*):                             # @foo(int*, int*)
        xor     eax, eax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        vmovdqu ymm0, ymmword ptr [rdi + 4*rax]
        vpaddd  ymm0, ymm0, ymmword ptr [rsi + 4*rax]
        vmovdqu ymmword ptr [rdi + 4*rax], ymm0
        add     rax, 8
        cmp     rax, 8
        jne     .LBB0_1
        vzeroupper
        ret

GCC / ICC produces with same flags:
foo(int*, int*):
        vmovdqu   ymm0, YMMWORD PTR [rdi]                       #6.5
        vpaddd    ymm1, ymm0, YMMWORD PTR [rsi]                 #6.5
        vmovdqu   YMMWORD PTR [rdi], ymm1                       #6.5
        vzeroupper                                              #7.1
        ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to