or C++ make scimark2 LU slower

law at redhat dot com Wed, 15 Feb 2017 13:25:17 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69564


--- Comment #29 from Jeffrey A. Law <law at redhat dot com> ---
So to bring this BZ back to the core questions (the scope seems to have widened
through the year since this originally reported).  Namely are the use of LTO or
C++ making things slower, particularly for scimark's LU factorization test.

>From my experiments, the answer is a very clear yes. I hacked up the test a bit
to only run LU and run a fixed number of iterations.  That makes comparisons
with something like callgrind much easier. 

Use of C++ adds 2-3% in terms of instruction counts.  LTO adds an additional
2-3% to the instruction counts.  These are additive, C++ with LTO is about 5%
higher than C without LTO.

The time (not surprisingly) is lost in LU_factor, the main culprit seems to be
this pair of nested loops:

           int ii;
            for (ii=j+1; ii<M; ii++)
            {
                double *Aii = A[ii];
                double *Aj = A[j];
                double AiiJ = Aii[j];       /* Here */
                int jj;
                for (jj=j+1; jj<N; jj++)
                  Aii[jj] -= AiiJ * Aj[jj];

            }

Callgrind calls out the marked line, which probably in reality means the
preheader for the inner loop.  For C w/o LTO it's ~12million instructions.  For
C++ with LTO it's ~21million instructions (remember, I'm just running LU and
for a relatively small number of iterations).

It's a bit of a surprise as these loops are dead simple, but it appears we've
got to be doing something dumb somewhere.  Hopefully that narrows things down a
bit.

[Bug c++/69564] [5/6/7 Regression] lto and/or C++ make scimark2 LU slower

Reply via email to