The first two patches are optimizations that I'm surprised we didn't already have. I noticed them when I was looking at the generated asm.
The next two patches are tests and some old stuff. There's a test that validates the vDSO AT_SYSINFO annotations. There's also a test that exercises some assumptions that signal handling and ptracers make about syscalls that currently do *not* hold on 64-bit AMD using 32-bit AT_SYSINFO. The next three patches are NT cleanups and a lockdep cleanup. It may pay to apply the beginning of the series (at most through "x86/entry/64/compat: After SYSENTER, move STI after the NT fixup") without waiting for everyone to wrap their heads around the rest. The rest is basically a rewrite of syscalls for all cases except 64-bit native. With these patches applied, there is a single 32-bit vDSO and it uses SYSCALL, SYSENTER, and INT80 almost interchangeably via alternatives. The semantics of SYSENTER and SYSCALL are defined as: 1. If SYSCALL, ESP = ECX 2. ECX = *ESP 3. IP = INT80 landing pad 4. Opportunistic SYSRET/SYSEXIT is enabled on return The vDSO is rearranged so that these semantics work. Anything that backs IP up by 2 ends up pointing at a bona fide int $0x80 instruction with the expected regs. In the process, the vDSO CFI annotations (which are actually used) get rewritten using normal CFI directives. Opportunistic SYSRET/SYSEXIT only happens on return when CS and SS are as expected, IP points to the INT80 landing pad, and flags are in good shape. (There is no longer any assumption that full fast-path 32-bit syscalls don't muck with the registers that matter for fast exits -- I played with maintaining an optimization like that with poor results. I may try again if it saves a few cycles.) Other than that, the system call entries are simplified to the bare minimum prologue and a call to a C function. Amusingly, SYSENTER and SYSCALL32 use the same C function. To make that work, I had to remove all the 32-bit syscall stubs except the clone argument hack. This is because, for C code to call through the system call table, the system call table entries need to be real function pointers with C-compatible ABIs. There is nothing at all anymore that requires that x86_32 syscalls be asmlinkage. That could be removed in a subsequent patch. The upshot appears to be a ~16 cycle performance hit on 32-bit fast path syscalls. (On my system, my little prctl test takes 172 cycles before and 188 cycles with these patches applied.) The slow path is probably faster under most circumstances and, if the exit slow path gets hit, it'll be much faster because (as we already do in the 64-bit native case) we can still use SYSEXIT/SYSRET. The patchset is structured as a removal of the old fast syscall code, then the change that makes syscalls into real functions, then a clean re-implementation of fast syscalls. If we want some of the 25 cycles back, we could consider open-coding a new C fast path. Changes from v1: - The unwind_vdso_32 test now warns on broken Debian installations instead of failing. The problem is now fully understood, will be fixed by Debian and possibly also fixed by upstream glibc. - execve was rather broken in v1. - It's quite a bit faster now (the optimizations at the end are mostly new). - int80 on 64-bit no longer clobbers extra regs (thanks Denys!). - The uaccess stuff is new. - Lots of other things that I forgot, I'm sure. Andy Lutomirski (36): x86/uaccess: Tell the compiler that uaccess is unlikely to fault x86/uaccess: __chk_range_not_ok is unlikely to return true selftests/x86: Add a test for vDSO unwinding selftests/x86: Add a test for syscall restart and arg modification x86/entry/64/compat: Fix SYSENTER's NT flag before user memory access x86/entry: Move lockdep_sys_exit to prepare_exit_to_usermode x86/entry/64/compat: After SYSENTER, move STI after the NT fixup x86/vdso: Remove runtime 32-bit vDSO selection x86/asm: Re-add manual CFI infrastructure x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm x86/vdso: Replace hex int80 CFI annotations with gas directives x86/elf/64: Clear more registers in elf_common_init x86/vdso/32: Save extra registers in the INT80 vsyscall path x86/entry/64/compat: Disable SYSENTER and SYSCALL32 entries x86/entry/64/compat: Remove audit optimizations x86/entry/64/compat: Remove most of the fast system call machinery x86/entry/64/compat: Set up full pt_regs for all compat syscalls x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h x86/syscalls: Give sys_call_ptr_t a useful type x86/entry: Add do_syscall_32, a C function to do 32-bit syscalls x86/entry/64/compat: Migrate the body of the syscall entry to C x86/entry: Add C code for fast system call entries x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls x86/entry/32: Open-code return tracking from fork and kthreads x86/entry/32: Switch INT80 to the new C syscall path x86/entry/32: Re-implement SYSENTER using the new C path x86/asm: Remove thread_info.sysenter_return x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls x86/entry: Make irqs_disabled checks in exit code depend on lockdep x86/entry: Force inlining of 32-bit syscall code x86/entry: Micro-optimize compat fast syscall arg fetch x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing x86/entry: Split and inline prepare_exit_to_usermode x86/entry: Split and inline syscall_return_slowpath arch/x86/Makefile | 10 +- arch/x86/entry/common.c | 255 ++++++++-- arch/x86/entry/entry_32.S | 184 +++---- arch/x86/entry/entry_64.S | 9 +- arch/x86/entry/entry_64_compat.S | 541 +++++---------------- arch/x86/entry/syscall_32.c | 9 +- arch/x86/entry/syscall_64.c | 4 +- arch/x86/entry/syscalls/syscall_32.tbl | 12 +- arch/x86/entry/vdso/Makefile | 39 +- arch/x86/entry/vdso/vdso2c.c | 2 +- arch/x86/entry/vdso/vdso32-setup.c | 28 +- arch/x86/entry/vdso/vdso32/int80.S | 56 --- arch/x86/entry/vdso/vdso32/syscall.S | 75 --- arch/x86/entry/vdso/vdso32/sysenter.S | 116 ----- arch/x86/entry/vdso/vdso32/system_call.S | 57 +++ arch/x86/entry/vdso/vma.c | 13 +- arch/x86/ia32/ia32_signal.c | 4 +- arch/x86/include/asm/dwarf2.h | 177 +++++++ arch/x86/include/asm/elf.h | 10 +- arch/x86/include/asm/syscall.h | 14 +- arch/x86/include/asm/thread_info.h | 1 - arch/x86/include/asm/uaccess.h | 14 +- arch/x86/include/asm/vdso.h | 10 +- arch/x86/kernel/asm-offsets.c | 3 - arch/x86/kernel/signal.c | 4 +- arch/x86/um/sys_call_table_32.c | 7 +- arch/x86/um/sys_call_table_64.c | 7 +- arch/x86/xen/setup.c | 13 +- tools/testing/selftests/x86/Makefile | 5 +- tools/testing/selftests/x86/ptrace_syscall.c | 294 +++++++++++ .../testing/selftests/x86/raw_syscall_helper_32.S | 46 ++ tools/testing/selftests/x86/unwind_vdso.c | 209 ++++++++ 32 files changed, 1258 insertions(+), 970 deletions(-) delete mode 100644 arch/x86/entry/vdso/vdso32/int80.S delete mode 100644 arch/x86/entry/vdso/vdso32/syscall.S delete mode 100644 arch/x86/entry/vdso/vdso32/sysenter.S create mode 100644 arch/x86/entry/vdso/vdso32/system_call.S create mode 100644 arch/x86/include/asm/dwarf2.h create mode 100644 tools/testing/selftests/x86/ptrace_syscall.c create mode 100644 tools/testing/selftests/x86/raw_syscall_helper_32.S create mode 100644 tools/testing/selftests/x86/unwind_vdso.c -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/