On Mon, Aug 13, 2012 at 8:27 AM, <paul_kon...@dell.com> wrote: > I'm not sure what LTO is supposed to do -- the documentation is not exactly > clear. But I assumed it should make things faster and/or smaller. > > So I tried using it on an application -- a processor emulator, CPU intensive > code, a lot of 64 bit integer arithmetic. > > Using a compile/assembler run on the emulated system as a benchmark, I > compared the code on x86_64-linux, gcc 4.7.0, -O2 plain, -O2 -fprofile-use > (after having done -fprofile-generate), and -O2 -fprofile-use -flto (using a > separate set of profile data files from -fprofile-generate -flto). > > Results: profiling speeds things up about 8%, but LTO is 50% (!) slower than > without. > > Any suggestions of what to look at for this?
LTO lets the compiler see all the code at once, enabling optimizations like inlining function calls across different source files. Like any optimization, there are cases where it will cause code to slow down rather than speed up. A 50% slowdown is certainly unusual, and suggests some systematic error. Figuring out what has gone wrong is like optimizing any program. Get a profile for your program, e.g., using -pg. Build the program with and without -flto, run it, and look at the resulting profiles. A 50% slowdown should be fairly obvious. I would guess that GCC has made a poor inlining decision, but the profile should show the problem for sure. Ian