On June 26, 2019 10:25:44 AM GMT+02:00, Uros Bizjak <ubiz...@gmail.com> wrote: >On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak <ubiz...@gmail.com> wrote: >> >> Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be >> able to auto-vectorize: > >On a related note, following slightly changed testcase: > >void >foo (char *restrict r, char *restrict a) >{ > for (int i = 0; i < 24; i++) > r[i] += a[i]; >} > >compiles to: > >foo: > vmovdqu (%rdi), %xmm1 > vpaddb (%rsi), %xmm1, %xmm0 > movzbl 16(%rsi), %eax > addb %al, 16(%rdi) > vmovups %xmm0, (%rdi) > movzbl 17(%rsi), %eax > addb %al, 17(%rdi) > movzbl 18(%rsi), %eax > addb %al, 18(%rdi) > movzbl 19(%rsi), %eax > addb %al, 19(%rdi) > movzbl 20(%rsi), %eax > addb %al, 20(%rdi) > movzbl 21(%rsi), %eax > addb %al, 21(%rdi) > movzbl 22(%rsi), %eax > addb %al, 22(%rdi) > movzbl 23(%rsi), %eax > addb %al, 23(%rdi) > ret > >One would expect that the remaining 8-byte array would also get >vectorized, resulting in one 16-byte operation and one 8-byte >operation.
Try - - param vect-epilogue-nomask=1 (or so). Richard. >Uros.