------- Comment #49 from steven at gcc dot gnu dot org 2009-12-28 15:40 ------- To make the test case work, I had to solve two errors by removing "static" keywords:
ttest.cc:105: error: explicit template specialization cannot have a storage class ttest.cc:117: error: explicit template specialization cannot have a storage class With that fixed, I timed the compiled binaries for x86_64 and for i386 Compiled for x86_64 (with "g++-4.5.0 -O3 ttest.cc -static -fpermissive -o ttest45" etc.): stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done real 0m4.238s user 0m4.210s sys 0m0.030s real 0m4.209s user 0m4.190s sys 0m0.000s real 0m4.193s user 0m4.170s sys 0m0.010s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest41 ; done real 0m3.733s user 0m3.720s sys 0m0.010s real 0m3.632s user 0m3.620s sys 0m0.000s real 0m3.662s user 0m3.630s sys 0m0.010s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done real 0m3.292s user 0m3.260s sys 0m0.020s real 0m3.338s user 0m3.300s sys 0m0.010s real 0m3.264s user 0m3.260s sys 0m0.010s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done real 0m3.515s user 0m3.500s sys 0m0.020s real 0m3.463s user 0m3.420s sys 0m0.000s real 0m3.518s user 0m3.490s sys 0m0.000s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done real 0m3.467s user 0m3.420s sys 0m0.010s real 0m3.378s user 0m3.380s sys 0m0.000s real 0m3.434s user 0m3.400s sys 0m0.000s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done real 0m0.284s user 0m0.280s sys 0m0.000s real 0m0.202s user 0m0.180s sys 0m0.000s real 0m0.183s user 0m0.180s sys 0m0.000s Compiled for i386 (with "g++-4.5.0 -O3 -m32 -march=pentium4 ttest.cc -static -fpermissive -o ttest45" etc.): stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest36 ; done real 0m4.092s user 0m4.080s sys 0m0.010s real 0m3.954s user 0m3.940s sys 0m0.020s real 0m3.988s user 0m3.970s sys 0m0.010s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest42 ; done real 0m5.818s user 0m5.810s sys 0m0.010s real 0m5.828s user 0m5.770s sys 0m0.030s real 0m5.813s user 0m5.790s sys 0m0.000s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest43 ; done real 0m5.379s user 0m5.360s sys 0m0.010s real 0m5.419s user 0m5.370s sys 0m0.030s real 0m5.382s user 0m5.360s sys 0m0.010s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest44 ; done real 0m4.430s user 0m4.410s sys 0m0.020s real 0m4.433s user 0m4.390s sys 0m0.010s real 0m4.389s user 0m4.380s sys 0m0.000s stev...@stevenb-laptop:~/t$ for f in 1 2 3 ; do time ./ttest45 ; done real 0m0.230s user 0m0.220s sys 0m0.010s real 0m0.236s user 0m0.220s sys 0m0.000s real 0m0.216s user 0m0.210s sys 0m0.000s So GCC 4.4 with -m32 still has a ~10% performance regression compared to CC 3.4, but GCC 4.5 appears to optimize the test case away (but I am not sure that the result is correct -- how to check for correctness?). For -m64 (x86-64), all GCC4 versions are better than GCC 3.4, and GCC 4.2 gives the best performance. Reconfirmed for 32-bits x86, then. -- steven at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2008-01-30 17:13:54 |2009-12-28 15:40:33 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863