https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68482
Bug ID: 68482 Summary: No vectorization for x86-64 Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: lvqcl.mail at gmail dot com Target Milestone: --- GCC ver: 5.2.0 and 4.9.2 Arch: x86-64 Options: -S -O2 -ftree-vectorize -msse2 Code: #include <stdint.h> void test(int32_t* input, int32_t* out, unsigned x1, unsigned x2) { unsigned i, j; unsigned end = x1; for(i = j = 0; i < 1000; i++) { int32_t sum = 0; end += x2; for( ; j < end; j++) sum += input[j]; out[i] = sum; } } GCC is able to vectorize the loop for IA32 arch, but not x86-64. The innermost loop for IA32: L4: movdqu (%ecx), %xmm1 addl $1, %ebx addl $16, %ecx cmpl %ebx, 4(%esp) paddd %xmm1, %xmm0 ja L4 The innermost loop for x86-64: .L3: movl %eax, %r10d addl $1, %eax addl (%rcx,%r10,4), %edx cmpl %eax, %r8d jne .L3