On Mon, Mar 19, 2012 at 02:00:18PM +0800, Andrew Lowe wrote > I'm looking for the faster code, the run time to be faster - I compile > the FEA & CFD code once but will be running many jobs. > > Andrew
Since you're asking on the Gentoo list, can I safely assume you use Gentoo? Gentoo gives *MUCH MORE* than 1% or 2% improvement, *IF YOU OPTIMIZE PROPERLY*. This does not mean going into "Gentoo-ricer" territory, but it does mean using the features of Gentoo to their fullest. Gentoo allows you to build binaries with gcc that are tuned to *YOUR* cpu. The advantage is that it gets the maximum out of your cpu. The disadvantage is that a binary compiled for a newer cpu will probably not run on an older cpu on another machine. In your /etc/make/conf, I recommend... CFLAGS="-O2 -march=native -mfpmath=sse -fomit-frame-pointer -pipe" CXXFLAGS="${CFLAGS}" MAKEOPTS="-j1" The "-march=native" setting tells gcc to build binaries to use all the cpu's available features. The CXXFLAGS variable is specific to C++. Set it identical to CFLAGS, so that C++ code gets the same settings. The MAKEOPTS="-j1" setting slows down the build process slightly, but it does *NOT* affect the final binary. It does avoid some occasional mysterious hard-to-reproduce errors that stop builds dead in their tracks. The first time you bang your head against the wall for a couple of hours trying to figure out the problem, you'll waste more time than you've saved with a higher numbered "j" value. Note also that if you've done a recent fresh install, you should... * emerge system * emerge world * rebuild the kernel entirely and reboot ...in that order. The binaries from the install CD are generic i686 (32 bit) or amd64 (64 bit) with no optional machine instructions selected.. They have to be this way to install properly on 8-year-old machines. But this doesn't take advantage of the faster instructions on newer machines. The following is a true story that happened to me, not a friend-of-a-friend. I have a 4 1/2 year old Dell with an onboard Intel GPU. Right after the install, it could not keep up with 1080i video from my TV tuner box, or even teh slowest speed for NHL GameCenter Live. After emerging system+world and rebuilding, the same machine was able to view 1080i TV and run the low-bandwidth version of NHL GameCenter Live. That is a very significat difference. Note that merely optimizing the program itself isn't enough. The binary is dynamically linked to various math libraries. The program reads data from and writes output to disk. And there are always kernel calls along the way. So optimizing every math library, disk I/O code, and kernel code contributes to faster execution. -- Walter Dnes <waltd...@waltdnes.org>