On June 26, 2019 10:25:44 AM GMT+02:00, Uros Bizjak <ubiz...@gmail.com> wrote:
>On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>>
>> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be
>> able to auto-vectorize:
>
>On a related note, following slightly changed testcase:
>
>void
>foo (char *restrict r, char *restrict a)
>{
>  for (int i = 0; i < 24; i++)
>    r[i] += a[i];
>}
>
>compiles to:
>
>foo:
>        vmovdqu (%rdi), %xmm1
>        vpaddb  (%rsi), %xmm1, %xmm0
>        movzbl  16(%rsi), %eax
>        addb    %al, 16(%rdi)
>        vmovups %xmm0, (%rdi)
>        movzbl  17(%rsi), %eax
>        addb    %al, 17(%rdi)
>        movzbl  18(%rsi), %eax
>        addb    %al, 18(%rdi)
>        movzbl  19(%rsi), %eax
>        addb    %al, 19(%rdi)
>        movzbl  20(%rsi), %eax
>        addb    %al, 20(%rdi)
>        movzbl  21(%rsi), %eax
>        addb    %al, 21(%rdi)
>        movzbl  22(%rsi), %eax
>        addb    %al, 22(%rdi)
>        movzbl  23(%rsi), %eax
>        addb    %al, 23(%rdi)
>        ret
>
>One would expect that the remaining 8-byte array would also get
>vectorized, resulting in one 16-byte operation and one 8-byte
>operation.

Try - - param vect-epilogue-nomask=1 (or so). 

Richard. 

>Uros.

Reply via email to