On Thu, Aug 4, 2011 at 4:05 PM, Mike Hommey <mhom...@mozilla.com> wrote:
> Hi,
>
> We (Mozilla) are trying to get the best of the ARM toolchain for our
> Android build. I recently built an Android Native-code Development Kit
> with GCC 4.6.1 and binutils 2.21.53, instead of GCC 4.4.3 and binutils
> 2.19 that come with the default NDK.
>
> LTO doesn't work at all, I'm getting an ICE that looks like the one from
> bug 41159.
>
> FDO however, works, but sadly, the resulting build is not only quite
> bigger, it's also slower on some tests (the Sunspider javascript
> benchmark). While we have seen improvements on other tests (most
> notably, the V8 benchmark is much faster) by switching to GCC 4.6 (that
> is, without FDO), FDO doesn't seem to bring anything on the table. It
> even seems to bring performance regression.
>
> Note that we do our normal builds with -Os and use -O3 for FDO. As for
> architecture specific flags, we use -marmv7-a -mthumb -mfloat-abi=softfp
> -mfpu=vfp. I've attempted a -O2 build in the past with GCC 4.4 but it
> was both bigger and slower than the -Os builds.
>
> So, it pretty much looks like current aggressive optimizations hit
> current hardware limitations and are slower than builds optimized for
> size.
>
> Has there been significant changes to the ARM backend that would justify
> that I try some more with current GCC HEAD? Should I maybe try some more
> with the linaro GCC branch? Are there things we can do to help getting
> better ARM performance?

-fprofile-use enables quite some optimizations that are even off for -O3
which are -funroll-loops and -fpeel-loops, -ftracer and -funswitch-loops.
Those will all be increasing code-size (hopefully only for hot code pieces
though).

Did you try using FDO with -Os?  FDO should make hot code parts
optimized similar to -O3 but leave other pieces optimized for size.
Using FDO with -O3 gives you the opposite, cold portions optimized
for size while the rest is optimized for speed.

Richard.

> Cheers,
>
> Mike
>

Reply via email to