http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 |P2 Summary|[4.6/4.7/4.8 Regression] |[4.6/4.7/4.8 |Performance breakdown for |Regression][IVOPTS] |gcc-4.{6,7} vs. gcc-4.5 |Performance breakdown for |using std::vector in matrix |gcc-4.{6,7} vs. gcc-4.5 |vector multiplication |using std::vector in matrix | |vector multiplication Known to fail| |4.7.1, 4.8.0 --- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 10:00:27 UTC --- Thanks for the reduced testcase. The innermost loops compare as follows: 4.5: .L7: movsd (%rbx,%rcx), %xmm0 addq $8, %rcx mulsd 0(%rbp,%rdx), %xmm0 addq $8, %rdx cmpq $24, %rdx addsd %xmm0, %xmm1 movsd %xmm1, (%rsi) jne .L7 4.7: .L13: movq 64(%rsp), %rdi movq 80(%rsp), %rdx addq %rcx, %rdi addq %r8, %rdx movsd -8(%rax,%rdi), %xmm0 mulsd (%rsi,%rax), %xmm0 addq $8, %rax cmpq $24, %rax addsd (%rdx), %xmm0 movsd %xmm0, (%rdx) jne .L13 so we seem to have a register allocation / spilling issue here as well as a bad induction variable choice. GCC 4.8 is not any better here.