On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
> able to auto-vectorize:

On a related note, following slightly changed testcase:

void
foo (char *restrict r, char *restrict a)
{
  for (int i = 0; i < 24; i++)
    r[i] += a[i];
}

compiles to:

foo:
        vmovdqu (%rdi), %xmm1
        vpaddb  (%rsi), %xmm1, %xmm0
        movzbl  16(%rsi), %eax
        addb    %al, 16(%rdi)
        vmovups %xmm0, (%rdi)
        movzbl  17(%rsi), %eax
        addb    %al, 17(%rdi)
        movzbl  18(%rsi), %eax
        addb    %al, 18(%rdi)
        movzbl  19(%rsi), %eax
        addb    %al, 19(%rdi)
        movzbl  20(%rsi), %eax
        addb    %al, 20(%rdi)
        movzbl  21(%rsi), %eax
        addb    %al, 21(%rdi)
        movzbl  22(%rsi), %eax
        addb    %al, 22(%rdi)
        movzbl  23(%rsi), %eax
        addb    %al, 23(%rdi)
        ret

One would expect that the remaining 8-byte array would also get
vectorized, resulting in one 16-byte operation and one 8-byte
operation.

Uros.

Reply via email to