Hi Nick, On Thu, Feb 25, 2021 at 12:31:33PM -0800, Nick Desaulniers wrote: > So LLVM is telling us bar() was inlined into foo(); (baz() can't be > because it wasn't defined in this TU). You can use this to "watch" > the compiler make decisions about inlining.
thanks for taking the time to write all this - it is very interesting and reminds me that I simply won't have time in this life of mine to learn about compiler inlining - that's a whole another universe. :-) I hope you can use that text in a blog post too - it is an interesting read. > (full thread: > https://lore.kernel.org/lkml/20210225112247.2240389-1-a...@kernel.org/) > I suspect in this specific case, "Interprocedural Sparse Conditional > Constant Propagation" sees the calls to the same fn with different > constants, propagates those down creating two specialized versions of > the callee (so they are distinct functions now), inlines those into > get_smp_config()/early_get_smp_config(), then there's too many callers > of those in a single TU where inlining would cause excessive code > bloat. Well, there's exactly one caller of get_smp_config - that's setup_arch(). early_get_smp_config() gets called also exactly once in amd_numa_init(). Now, with my simplistic approach, I can replace the lines at those call sites by hand with the x86_init.mpparse.get_smp_config(<arg>); call. So those become exactly one function call. I still don't see how that can be done any differently, frankly. But apparently the cost model has decided that this is not inlineable. Maybe because that function ptr is assigned at boot time and that somehow gets the cost model to give it a very high (or low) value. Or maybe because the wrappers are calling through a variable - the x86_init thing - which is in a different section and that confuses the inliner. Or whatever - totally speculating here. And this brings me to my point - you can't expect people to do all that crazy dance of compiler introspection and understand cost models and compiler optimization just to fix stuff like that. Now, imagine we "fix" this to clang-13's inliner's satisfaction. Now imagine too that gcc Version Next changes their inliner and that inliner says that that "fix" is wrong, for whatever reason, bottom up, top down, whatever. Do you feel the annoyance all around? And since, as you say, there are no silver bullets here, I think for cases like that we'll need a "I know what I'm doing Mr. Compiler, TYVM, even if your cost model says otherwise" facility. And in this case I still think __always_inline is correct. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette