GCC considers the number of statements in inlined assembly blocks, according to new-lines and semicolons, as an indication to the cost of the block in time and space. This data is distorted by the kernel code, which puts information in alternative sections. As a result, the compiler may perform incorrect inlining and branch optimizations.
The solution is to set an assembly macro and call it from the inlined assembly block. As a result GCC considers the inline assembly block as a single instruction. This patch handles the LOCK prefix, allowing more aggresive inlining. text data bss dec hex filename 18127205 10068388 2936832 31132425 1db0b09 ./vmlinux before 18131468 10068488 2936832 31136788 1db1c14 ./vmlinux after (+4363) Static text symbols: Before: 39860 After: 39788 (-72) Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: x...@kernel.org Cc: Josh Poimboeuf <jpoim...@redhat.com> Signed-off-by: Nadav Amit <na...@vmware.com> --- arch/x86/include/asm/alternative.h | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 4cd6a3b71824..daa68ad51665 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -31,14 +31,26 @@ */ #ifdef CONFIG_SMP -#define LOCK_PREFIX_HERE \ - ".pushsection .smp_locks,\"a\"\n" \ - ".balign 4\n" \ - ".long 671f - .\n" /* offset */ \ - ".popsection\n" \ - "671:" - -#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; " + +asm ("\n" + ".macro __LOCK_PREFIX_HERE\n\t" + ".pushsection .smp_locks,\"a\"\n\t" + ".balign 4\n\t" + ".long 671f - .\n\t" /* offset */ + ".popsection\n" + "671:\n" + ".endm"); + +#define LOCK_PREFIX_HERE "__LOCK_PREFIX_HERE" + +asm ("\n" + ".macro __LOCK_PREFIX ins:vararg\n\t" + "__LOCK_PREFIX_HERE\n\t" + "lock; \\ins\n" + ".endm"); + +#define LOCK_PREFIX "__LOCK_PREFIX " + #else /* ! CONFIG_SMP */ #define LOCK_PREFIX_HERE "" -- 2.17.0