------- Comment #4 from ubizjak at gmail dot com 2008-08-02 13:00 ------- (In reply to comment #3) > Operations in loops should now be vectorized. The original testcase is > probably not worth vectorizing due to calling convention problems (_Complex T > is not passed as a vector).
Not really. For some unknown reason, _Complex float is passed as a two element vector in SSE register. This introduces (double!) store forwarding penalty, since we have to split the value into SSE pair before processing. This is wrong ABI design, as shown by comparing generated code from following example: --cut here-- _Complex float testf (_Complex float a, _Complex float b) { return a + b; } _Complex double testd (_Complex double a, _Complex double b) { return a + b; } --cut here-- testf: movq %xmm0, -8(%rsp) movq %xmm1, -16(%rsp) movss -8(%rsp), %xmm0 movss -4(%rsp), %xmm2 addss -16(%rsp), %xmm0 addss -12(%rsp), %xmm2 movss %xmm0, -24(%rsp) movss %xmm2, -20(%rsp) movq -24(%rsp), %xmm0 ret testd: addsd %xmm3, %xmm1 addsd %xmm2, %xmm0 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485