http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #16 from arturomdn at gmail dot com 2013-02-14 17:42:55 UTC ---
With -ftree-vectorize -fno-tree-loop-if-convert flags it generated this for the
loop in question:
.L39:
movq %rdi, %rdx
addq (%rsi,%rax,8), %rcx
imulq (%r9,%rax,8), %rdx
addq %rcx, %rdx
xorl %ecx, %ecx
cmpq %r10, %rdx
jbe .L38
movq %rdx, %rcx
andl $4294967295, %edx
shrq $32, %rcx
.L38:
addq $1, %rax
cmpq %r8, %rax
movq %rdx, -8(%rsi,%rax,8)
jne .L39
And it executed fast:
./by-val-O3-flags
Took 6.74 seconds total.