at 2:45 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * Nadav Amit <na...@vmware.com> wrote: > >>> Another, separate question I wanted to ask: how do we ensure that the >>> kernel stays fixed? >>> I.e. is there some tooling we can use to actually measure whether there's >>> bad inlining decisions >>> done, to detect all these bad patterns that cause bad GCC code generation? >> >> Good question. First, I’ll indicate that this patch-set does not handle all >> the issues. There is still the issue of conditional use of >> __builtin_constant_p(). >> >> One indication for bad inlining decisions is the inlined functions have >> multiple (non-inlined) instances in the binary and are short. I don’t >> have an automatic solution, but you can try, for example to run: >> >> nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \ >> grep -v '^ 1' | sort -n -r | head -n 5 >> >> There are however many false positives. After these patches, for example, I >> get: >> >> 11 000000000000012f t jhash >> 7 0000000000000017 t dst_output >> 6 0000000000000011 t kzalloc >> 5 000000000000002f t acpi_os_allocate_zeroed >> 5 0000000000000029 t acpi_os_allocate >> >> >> jhash() should not have been inlined in my mind, and should have a >> non-inlined implementation. dst_output() is used as a function pointer. >> kzalloc() and the next two suffer from the __builtin_constant_p() problem I >> described in the past. > > Ok, that's useful info. > > The histogram suggests that with all your patches applied the kernel is now > in a pretty good > state in terms of inlining decisions, right?
It was just an example that I ran on the kernel I built right now (with a custom config). Please don’t regard these results as anything indicative. > Are you using defconfig or a reasonable distro-config for your tests? I think it is best to take the kernel and run localyesconfig for testing. Taking Ubuntu 18.04 and doing the same gives the following results: 12 000000000000012f t jhash 7 0000000000000017 t dst_output 7 0000000000000014 t init_once 5 00000000000000d8 t jhash2 5 000000000000004e t put_page 5 000000000000002f t acpi_os_allocate_zeroed 5 0000000000000029 t acpi_os_allocate 5 0000000000000028 t umask_show 5 0000000000000011 t kzalloc 4 0000000000000053 t trace_xhci_dbg_quirks Not awful, but not great. It is a bit hard to fix the __builtin_constant_p() problem without having some negative side-effects. Reminder: __builtin_constant_p() is evaluated after the inlining decision are done. You can use __builtin_choose_expr() instead of an “if”s and instead of ternary operators when evaluating __builtin_constant_p() to solve the problem. However, this causes the compiler not to know sometimes that a value is constant because __builtin_choose_expr () evaluation happens too early. This __builtin_choose_expr() problem is the reason for put_page() and kzalloc() are not being inlined. Clang, again, does not suffer from this problem. Anyhow, it may be a good practice to try to get rid of the rest. For example, dst_discard() has four instances because it is always given as a function pointer. So it should not have been inlined. Regards, Nadav