This is a revised version of optpolines (formerly named retpolines) for dynamic indirect branch promotion in order to reduce retpoline overheads [1].
This version address some of the concerns that were raised before. Accordingly, the code was slightly simplified and patching is now done using the regular int3/breakpoint mechanism. Outline optpolines for multiple targets was added. I do not think the way I implemented it is the correct one. In my original (private) version, if there are more targets than the outline block can hold, the outline block is completely removed. However, I think this is more-or-less how Josh wanted it to be. The code modifications are now done using a gcc-plugin. This allows to easily ignore code from init and other code sections. I think it should also allow us to add opt-in/opt-out support for each branch, for example by marking function pointers using address-space attributes. All of these changes required some optimizations to go away to keep the code simple. I have still did not run the benchmarks again. So I might have not addressed all the open issues, but it is rather hard to finish the implementation since some still open high-level decisions affect the way in which optimizations should be done. Specifically: - Is it going to be the only indirect branch promotion mechanism? If so, it probably should also provide interface similar to Josh's "static-calls" with annotations. - Should it also be used when retpolines are disabled (in the config)? This does complicate the implementation a bit (RFC v1 supported it). - Is it going to be opt-in or opt-out? If it is an opt-out mechanism, memory and performance optimizations need to be more aggressive. - Do we use periodic learning or not? Josh suggested to reconfigure the branches whenever a new target is found. However, I do not know at this time how to do learning efficiently, without making learning much more expensive. [1] https://lore.kernel.org/patchwork/cover/1001332/ Nadav Amit (6): x86: introduce kernel restartable sequence objtool: ignore instructions x86: patch indirect branch promotion x86: interface for accessing indirect branch locations x86: learning and patching indirect branch targets x86: outline optpoline arch/x86/Kconfig | 4 + arch/x86/entry/entry_64.S | 16 +- arch/x86/include/asm/nospec-branch.h | 83 ++ arch/x86/include/asm/sections.h | 2 + arch/x86/kernel/Makefile | 1 + arch/x86/kernel/asm-offsets.c | 9 + arch/x86/kernel/nospec-branch.c | 1293 ++++++++++++++++++ arch/x86/kernel/traps.c | 7 + arch/x86/kernel/vmlinux.lds.S | 7 + arch/x86/lib/retpoline.S | 83 ++ include/linux/cpuhotplug.h | 1 + include/linux/module.h | 9 + kernel/module.c | 8 + scripts/Makefile.gcc-plugins | 3 + scripts/gcc-plugins/x86_call_markup_plugin.c | 329 +++++ tools/objtool/check.c | 21 +- 16 files changed, 1872 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kernel/nospec-branch.c create mode 100644 scripts/gcc-plugins/x86_call_markup_plugin.c -- 2.17.1