Well I tried a few different builds of xplor-nih tonight with the following optimization flags for the gcc and g++ compilers... testsuite in seconds xplor python tcl -O3 -ffastmath -mtune=970 137.5454 128.7770 48.0390 -O3 -ffastmath -mtune=970 -fno-threadsafe-statics 137.0741 127.4653 48.0205 -O3 -ffastmath -mtune=970 -finline-limit=1200 135.4462 127.5790 48.3680
As you can see the c/c++ code (mostly c++) in xplor-nih is immune to improvement in gcc 4.2.0 with the -fno-threadsafe-statics and -finline-limit=1200. The same build using Apple's gcc 3.3 would execute about 7% faster. Is there anything not usually enabled in -O3 that might help? I am rather confused by the options... -ftree-vectorize -fipa-cp and the rest as to which ones are part of -O3 in gcc 4.2.0 and which require enabling (as well as which are incompatible with each other). I would be interested in trying to squeeze some more performance out of the gcc 4.2.0 compiles but am at a loss for the logical approach to doing this (short of resorting to -fprofile-use). Thanks in advance for any other advice. Jack