[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

ubizjak at gmail dot com Sat, 02 Aug 2008 06:01:41 -0700


------- Comment #4 from ubizjak at gmail dot com  2008-08-02 13:00 -------
(In reply to comment #3)
> Operations in loops should now be vectorized.  The original testcase is
> probably not worth vectorizing due to calling convention problems (_Complex T
> is not passed as a vector).


Not really. For some unknown reason, _Complex float is passed as a two element
vector in SSE register. This introduces (double!) store forwarding penalty,
since we have to split the value into SSE pair before processing. This is wrong
ABI design, as shown by comparing generated code from following example:

--cut here--
_Complex float testf (_Complex float a, _Complex float b)
{
  return a + b;
}

_Complex double testd (_Complex double a, _Complex double b)
{
  return a + b;
}
--cut here--

testf:
        movq    %xmm0, -8(%rsp)
        movq    %xmm1, -16(%rsp)
        movss   -8(%rsp), %xmm0
        movss   -4(%rsp), %xmm2
        addss   -16(%rsp), %xmm0
        addss   -12(%rsp), %xmm2
        movss   %xmm0, -24(%rsp)
        movss   %xmm2, -20(%rsp)
        movq    -24(%rsp), %xmm0
        ret

testd:
        addsd   %xmm3, %xmm1
        addsd   %xmm2, %xmm0
        ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

Reply via email to