On Thu, Aug 4, 2011 at 4:05 PM, Mike Hommey <mhom...@mozilla.com> wrote: > Hi, > > We (Mozilla) are trying to get the best of the ARM toolchain for our > Android build. I recently built an Android Native-code Development Kit > with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils > 2.19 that come with the default NDK. > > LTO doesn't work at all, I'm getting an ICE that looks like the one from > bug 41159. > > FDO however, works, but sadly, the resulting build is not only quite > bigger, it's also slower on some tests (the Sunspider javascript > benchmark). While we have seen improvements on other tests (most > notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that > is, without FDO), FDO doesn't seem to bring anything on the table. It > even seems to bring performance regression. > > Note that we do our normal builds with -Os and use -O3 for FDO. As for > architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp > -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it > was both bigger and slower than the -Os builds. > > So, it pretty much looks like current aggressive optimizations hit > current hardware limitations and are slower than builds optimized for > size. > > Has there been significant changes to the ARM backend that would justify > that I try some more with current GCC HEAD? Should I maybe try some more > with the linaro GCC branch? Are there things we can do to help getting > better ARM performance?
-fprofile-use enables quite some optimizations that are even off for -O3 which are -funroll-loops and -fpeel-loops, -ftracer and -funswitch-loops. Those will all be increasing code-size (hopefully only for hot code pieces though). Did you try using FDO with -Os? FDO should make hot code parts optimized similar to -O3 but leave other pieces optimized for size. Using FDO with -O3 gives you the opposite, cold portions optimized for size while the rest is optimized for speed. Richard. > Cheers, > > Mike >