On Mon, Mar 31, 2025 at 1:20 PM Julian Waters via Gcc <gcc@gcc.gnu.org> wrote: > > Hi all, > > I've been trying to chase down an issue that's been driving me insane > for a while now. It has to do with the flatten attribute being > combined with LTO. I've heard that flatten and LTO are a match made in > hell (Someone else's words, not mine), but from what I observe, > several methods marked as flatten on Linux compile to an acceptable > size with ok amount of inlining, but on Windows however... The exact > same methods marked as flatten have their callees inlined so > aggressively that they reach sizes of 5MB per method! Something seems > to be different between how inlining works on the 2 platforms, what > are the differences (If any) between Linux and Windows when it comes > to inlining, particularly involving the flatten attribute? Is there a > list of differences that is easily accessible somewhere, or > alternatively is there somewhere in the gcc source where the > heuristics are defined that I can decipher? > > Here's one such example of the differences between Linux and Windows > (Both were compiled with the same optimization settings, -O3 and > -flto=auto): > > Linux: > 00000000010b12d0 0000000000006289 t > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > > Windows: > 0000000296f9b0c0 0000000000642d40 T > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) [clone > .constprop.0] > 0000000295125480 0000000000630080 T > G1ParScanThreadState::trim_queue_to_threshold(unsigned int) > > > Thanks in advance for the help, and for humouring my question
The main difference is that LTO on Linux can use the linker plugin to derive information about how TUs are combined while on Windows we're using the "collect2 path" which is quite unmaintained and which gives imprecise information. This can already result in quite different inlining. You can "simulated" that on Linux with -fno-use-linker-plugin (only for experimenting, don't use this unless necssary). Richard. > > best regards, > Julian