On Wed, Jun 26, 2019 at 10:17 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > Now that TARGET_MMX_WITH_SSE is implemented, the compiler should be > able to auto-vectorize:
On a related note, following slightly changed testcase: void foo (char *restrict r, char *restrict a) { for (int i = 0; i < 24; i++) r[i] += a[i]; } compiles to: foo: vmovdqu (%rdi), %xmm1 vpaddb (%rsi), %xmm1, %xmm0 movzbl 16(%rsi), %eax addb %al, 16(%rdi) vmovups %xmm0, (%rdi) movzbl 17(%rsi), %eax addb %al, 17(%rdi) movzbl 18(%rsi), %eax addb %al, 18(%rdi) movzbl 19(%rsi), %eax addb %al, 19(%rdi) movzbl 20(%rsi), %eax addb %al, 20(%rdi) movzbl 21(%rsi), %eax addb %al, 21(%rdi) movzbl 22(%rsi), %eax addb %al, 22(%rdi) movzbl 23(%rsi), %eax addb %al, 23(%rdi) ret One would expect that the remaining 8-byte array would also get vectorized, resulting in one 16-byte operation and one 8-byte operation. Uros.