Dear all,

=== problem description ===

Mixed T - complex<T> algebra is horribly inefficient in svn-trunk. This is
demonstrated by the attached testcase 'test.cpp'.

*** gcc-4.4.1:

$ g++ -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.817s

$ g++ -DUSE_MY_CMULT -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.825s

*** gcc 4.5.0 20100217 (revision 156834):

$ g++ -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m28.574s  # Aaaaaaargh!!!

$ g++ -DUSE_MY_CMULT -O2 test.cpp && time echo 5000000000 | ./a.out
...
user    0m8.817s

=== analysis ===

The libstdc++ implementation of std::complex<T>::operator*=(T val) does a '/=
val' on its '__complex__ T' data representation (_GLIBCXX_USE_C99_COMPLEX is
enabled on my OpenSUSE GNU/Linux box). In present gcc, 'val' is promoted to a
(degenerate) complex type. Next, in 'expand_complex_multiplication'
(tree-complex.c) this leads to the emission of a libcall (to __muldc3), since
-fcx-limited-range is not in effect, so no reduction is allowed.

Indeed, when the straightforward custom operator*= is active in the testcase
above (-DUSE_MY_CMULT), we get back the performance that we wish: this version
simply multiplies the scalar components of the complex number separately.

Would a <complex> patch that implements scalar multiplication (and division) of
complex specialisations this way be accepted? The factor 3 performance loss is
unacceptable for any serious numerical work in C++.

=== Discussion, caveats, related issues ===

* such patch would not address the C99-issue of complex*scalar; but at least
math-oriented C++ users will be happy.

* If this change is implemented, PR28408 is a C(99)-only issue. All signs are
then correct, the inf/nan issues have apparently been previously fixed.

* From the C++0x working draft (26.4.6, item 7):

        template<class T> complex<T> operator*(const complex<T>& lhs, const
complex<T>& rhs);
        template<class T> complex<T> operator*(const complex<T>& lhs, const T&
rhs);
        template<class T> complex<T> operator*(const T& lhs, const complex<T>&
rhs);

        Returns: complex<T>(lhs) *= rhs.

Note the third overload: of course this should be implemented efficiently as
'return complex<T>(rhs) *= lhs', which is what libstdc++ does --- fortunately.
This is however *not* the same as the return value which is specified by the
standard (think signed zeros). Is this a libstdc++ bug, or an editorial
oversight in N3000=09-0190. In   the last case, is someone on this list in the
position to raise this issue?


-- 
           Summary: mixed complex<T> multiplication horribly inefficient
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jan at epgmod dot phys dot tue dot nl
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43108

Reply via email to