Hi, We (Mozilla) are trying to get the best of the ARM toolchain for our Android build. I recently built an Android Native-code Development Kit with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils 2.19 that come with the default NDK.
LTO doesn't work at all, I'm getting an ICE that looks like the one from bug 41159. FDO however, works, but sadly, the resulting build is not only quite bigger, it's also slower on some tests (the Sunspider javascript benchmark). While we have seen improvements on other tests (most notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that is, without FDO), FDO doesn't seem to bring anything on the table. It even seems to bring performance regression. Note that we do our normal builds with -Os and use -O3 for FDO. As for architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it was both bigger and slower than the -Os builds. So, it pretty much looks like current aggressive optimizations hit current hardware limitations and are slower than builds optimized for size. Has there been significant changes to the ARM backend that would justify that I try some more with current GCC HEAD? Should I maybe try some more with the linaro GCC branch? Are there things we can do to help getting better ARM performance? Cheers, Mike