at 2:45 AM, Ingo Molnar <mi...@kernel.org> wrote:

> 
> * Nadav Amit <na...@vmware.com> wrote:
> 
>>> Another, separate question I wanted to ask: how do we ensure that the 
>>> kernel stays fixed?
>>> I.e. is there some tooling we can use to actually measure whether there's 
>>> bad inlining decisions 
>>> done, to detect all these bad patterns that cause bad GCC code generation?
>> 
>> Good question. First, I’ll indicate that this patch-set does not handle all
>> the issues. There is still the issue of conditional use of
>> __builtin_constant_p().
>> 
>> One indication for bad inlining decisions is the inlined functions have
>> multiple (non-inlined) instances in the binary and are short. I don’t
>> have an automatic solution, but you can try, for example to run:
>> 
>> nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \
>>      grep -v '^      1' | sort -n -r | head -n 5
>> 
>> There are however many false positives. After these patches, for example, I
>> get:
>> 
>>     11 000000000000012f t jhash
>>      7 0000000000000017 t dst_output
>>      6 0000000000000011 t kzalloc
>>      5 000000000000002f t acpi_os_allocate_zeroed
>>      5 0000000000000029 t acpi_os_allocate
>> 
>> 
>> jhash() should not have been inlined in my mind, and should have a
>> non-inlined implementation. dst_output() is used as a function pointer.
>> kzalloc() and the next two suffer from the __builtin_constant_p() problem I
>> described in the past.
> 
> Ok, that's useful info.
> 
> The histogram suggests that with all your patches applied the kernel is now 
> in a pretty good 
> state in terms of inlining decisions, right?

It was just an example that I ran on the kernel I built right now (with a
custom config). Please don’t regard these results as anything indicative.

> Are you using defconfig or a reasonable distro-config for your tests?

I think it is best to take the kernel and run localyesconfig for testing.

Taking Ubuntu 18.04 and doing the same gives the following results:

     12 000000000000012f t jhash
      7 0000000000000017 t dst_output
      7 0000000000000014 t init_once
      5 00000000000000d8 t jhash2
      5 000000000000004e t put_page
      5 000000000000002f t acpi_os_allocate_zeroed
      5 0000000000000029 t acpi_os_allocate
      5 0000000000000028 t umask_show
      5 0000000000000011 t kzalloc
      4 0000000000000053 t trace_xhci_dbg_quirks

Not awful, but not great.

It is a bit hard to fix the __builtin_constant_p() problem without having
some negative side-effects.

Reminder: __builtin_constant_p() is evaluated after the inlining decision
are done. You can use __builtin_choose_expr() instead of an “if”s and
instead of ternary operators when evaluating __builtin_constant_p() to solve
the problem. However, this causes the compiler not to know sometimes that a
value is constant because __builtin_choose_expr () evaluation happens too
early. This __builtin_choose_expr() problem is the reason for put_page() and
kzalloc() are not being inlined. Clang, again, does not suffer from this
problem.

Anyhow, it may be a good practice to try to get rid of the rest. For
example, dst_discard() has four instances because it is always given as a
function pointer. So it should not have been inlined.

Regards,
Nadav

Reply via email to