------- Comment #49 from steven at gcc dot gnu dot org  2009-12-28 15:40 -------
To make the test case work, I had to solve two errors by removing "static"
keywords:

ttest.cc:105: error: explicit template specialization cannot have a storage
class
ttest.cc:117: error: explicit template specialization cannot have a storage
class

With that fixed, I timed the compiled binaries for x86_64 and for i386


Compiled for x86_64 (with "g++-4.5.0 -O3 ttest.cc -static -fpermissive -o
ttest45" etc.):

stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done

real    0m4.238s
user    0m4.210s
sys     0m0.030s

real    0m4.209s
user    0m4.190s
sys     0m0.000s

real    0m4.193s
user    0m4.170s
sys     0m0.010s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest41 ; done

real    0m3.733s
user    0m3.720s
sys     0m0.010s

real    0m3.632s
user    0m3.620s
sys     0m0.000s

real    0m3.662s
user    0m3.630s
sys     0m0.010s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done

real    0m3.292s
user    0m3.260s
sys     0m0.020s

real    0m3.338s
user    0m3.300s
sys     0m0.010s

real    0m3.264s
user    0m3.260s
sys     0m0.010s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done

real    0m3.515s
user    0m3.500s
sys     0m0.020s

real    0m3.463s
user    0m3.420s
sys     0m0.000s

real    0m3.518s
user    0m3.490s
sys     0m0.000s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done

real    0m3.467s
user    0m3.420s
sys     0m0.010s

real    0m3.378s
user    0m3.380s
sys     0m0.000s

real    0m3.434s
user    0m3.400s
sys     0m0.000s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done

real    0m0.284s
user    0m0.280s
sys     0m0.000s

real    0m0.202s
user    0m0.180s
sys     0m0.000s

real    0m0.183s
user    0m0.180s
sys     0m0.000s




Compiled for i386 (with "g++-4.5.0 -O3 -m32 -march=pentium4 ttest.cc -static
-fpermissive -o ttest45" etc.):

stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done

real    0m4.092s
user    0m4.080s
sys     0m0.010s

real    0m3.954s
user    0m3.940s
sys     0m0.020s

real    0m3.988s
user    0m3.970s
sys     0m0.010s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done

real    0m5.818s
user    0m5.810s
sys     0m0.010s

real    0m5.828s
user    0m5.770s
sys     0m0.030s

real    0m5.813s
user    0m5.790s
sys     0m0.000s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done

real    0m5.379s
user    0m5.360s
sys     0m0.010s

real    0m5.419s
user    0m5.370s
sys     0m0.030s

real    0m5.382s
user    0m5.360s
sys     0m0.010s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done

real    0m4.430s
user    0m4.410s
sys     0m0.020s

real    0m4.433s
user    0m4.390s
sys     0m0.010s

real    0m4.389s
user    0m4.380s
sys     0m0.000s
stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done

real    0m0.230s
user    0m0.220s
sys     0m0.010s

real    0m0.236s
user    0m0.220s
sys     0m0.000s

real    0m0.216s
user    0m0.210s
sys     0m0.000s


So GCC 4.4 with -m32 still has a ~10% performance regression compared to CC
3.4, but GCC 4.5 appears to optimize the test case away (but I am not sure that
the result is correct -- how to check for correctness?).

For -m64 (x86-64), all GCC4 versions are better than GCC 3.4, and GCC 4.2 gives
the best performance.

Reconfirmed for 32-bits x86, then.


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2008-01-30 17:13:54         |2009-12-28 15:40:33
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863

Reply via email to