On Wed, Jun 26, 2019 at 10:36 AM Richard Biener <rguent...@suse.de> wrote:
>
> On June 26, 2019 10:25:44 AM GMT+02:00, Uros Bizjak <ubiz...@gmail.com> wrote:
> >On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >>
> >> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
> >> able to auto-vectorize:
> >
> >On a related note, following slightly changed testcase:
> >
> >void
> >foo (char *restrict r, char *restrict a)
> >{
> >  for (int i = 0; i < 24; i++)
> >    r[i] += a[i];
> >}
> >
> >compiles to:
> >
> >foo:
> >        vmovdqu (%rdi), %xmm1
> >        vpaddb  (%rsi), %xmm1, %xmm0
> >        movzbl  16(%rsi), %eax
> >        addb    %al, 16(%rdi)
> >        vmovups %xmm0, (%rdi)
> >        movzbl  17(%rsi), %eax
> >        addb    %al, 17(%rdi)
> >        movzbl  18(%rsi), %eax
> >        addb    %al, 18(%rdi)
> >        movzbl  19(%rsi), %eax
> >        addb    %al, 19(%rdi)
> >        movzbl  20(%rsi), %eax
> >        addb    %al, 20(%rdi)
> >        movzbl  21(%rsi), %eax
> >        addb    %al, 21(%rdi)
> >        movzbl  22(%rsi), %eax
> >        addb    %al, 22(%rdi)
> >        movzbl  23(%rsi), %eax
> >        addb    %al, 23(%rdi)
> >        ret
> >
> >One would expect that the remaining 8-byte array would also get
> >vectorized, resulting in one 16-byte operation and one 8-byte
> >operation.
>
> Try - - param vect-epilogue-nomask=1 (or so).

Yes, this (--param vect-epilogues-nomask=1) works!

foo:
        movdqu  (%rdi), %xmm0
        movdqu  (%rsi), %xmm2
        movq    16(%rsi), %xmm1
        paddb   %xmm2, %xmm0
        movups  %xmm0, (%rdi)
        movq    16(%rdi), %xmm0
        paddb   %xmm1, %xmm0
        movq    %xmm0, 16(%rdi)
        ret

Thanks,
Uros.

Reply via email to