gcc 4.9.2 has worse performance than clang 3.5 when dealing with complex numbers.
See bug 64410: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 For adding two arrays with complex numbers, clang's vectoriser is better able to exploit the layout of complex numbers. Inner loop produced by gcc: .L52: movsd (%r15,%rax), %xmm1 movsd 8(%r15,%rax), %xmm0 addsd 0(%rbp,%rax), %xmm1 addsd 8(%rbp,%rax), %xmm0 movsd %xmm1, (%rbx,%rax) movsd %xmm0, 8(%rbx,%rax) addq $16, %rax cmpq %rsi, %rax jne .L52 Inner loop produced by clang: .LBB0_145: movupd -16(%rbx), %xmm0 movupd -16(%rax), %xmm1 addpd %xmm0, %xmm1 movupd %xmm1, -16(%rdi) movupd (%rbx), %xmm0 movupd (%rax), %xmm1 addpd %xmm0, %xmm1 movupd %xmm1, (%rdi) addq $2, %rbp addq $32, %rbx addq $32, %rax addq $32, %rdi addl $-2, %ecx jne .LBB0_145