On 12/08/2017 12:00 AM, Richard Biener wrote:
On December 8, 2017 4:26:05 AM GMT+01:00, Martin Sebor <mse...@gmail.com> wrote:
On 12/06/2017 11:45 PM, Richard Biener wrote:
On December 7, 2017 2:15:53 AM GMT+01:00, Martin Sebor
<mse...@gmail.com> wrote:
On 12/06/2017 12:11 PM, Richard Biener wrote:
On December 6, 2017 6:38:11 PM GMT+01:00, Martin Sebor
<mse...@gmail.com> wrote:
While testing a libstdc++ patch that relies on inlining to
expose a GCC limitation I noticed that the same member function
of a class template is inlined into one function in the test but
not into the other, even though it is inlined into each if each
is compiled separately (i.e., in a file on its own).

I wasn't aware that inlining decisions made in one function could
affect those in another, or in the whole file.  Is that expected?
And if yes, what's the rationale?

Here's a simplified test case.  When compiled with -O2 or -O3
and either just -DFOO or just -DBAR, the call to vector::resize()
and all the functions called from it, including (crucially)
vector::_M_default_append, are inlined.  But when compiled with
-DFOO -DBAR _M_default_append is not inlined.  With a somewhat
more involved test case I've also seen the first call inlined
but not the second, which was also surprising to me.

There are unit and function growth limits that can be hit.

I see, thank you for reminding me.

Nothing brings the implications into sharp focus like two virtually
identical functions optimized differently as a result of exceeding
some size limit.  It would make perfect sense to me if I were using
-Os but I can't help but wonder how useful this heuristic is at -O3.

Well. The inlining process is basically inlining functions sorted by
priority until the limits are hit (or nothing is profitable anymore).
Without such limit we'd blow size through the roof.

I understand why it's done.  What I'm wondering is if the logic
that controls it or the selected per-translation unit limits do,
in fact, yield optimal results at all optimization levels, and

Well - the limits are set in a way to limit code size growth. In that sense 
they are 'optimal' in case the set growth percentage is what you want...

if they do, what it means for users and how they structure their
source code.

It obviously surprised me to have the compiler optimize a simple,
trivial function on its own one way only to then disable the same
optimization when another equivalent function was added to the
file.  It rendered my test ineffective and I only found out by
accident.  I suspect others would find this effect surprising as
well, and not in a good way.  If the inlining algorithm is tuned
to deliver optimal results at all optimization levels (but even
if it isn't) then its effects seem worth pointing out in the
manual.  What advice should we give to users when it comes to
inlining?  I'm thinking of something that might go in section
An Inline Function is As Fast As a Macro:
https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Inline.html
(unless there's a better place for it).

Do not make tiny translation units. There are several knobs to work around the 
fact that if the compiler only sees one TU estimating growth of the whole 
program is hard. This is why we do so much better with LTO.

Okay, that's actually the opposite of what I was thinking at
first (one function per TU) but it makes sense for C++ where
large translation units are the norm.  It also makes sense
for C projects already structured to define one function per
TU.  Where it breaks down is in projects that do something
in between.  Let me see if I can put a sentence or two
together to add that to the inlining page and also mention
LTO.

It might also make sense to mention this on the GCC testing
Wiki as a pitfall when writing test cases because those are
almost invariably small.

Martin

Reply via email to