The SYSCALL prologue starts with SWAPGS immediately followed by a gs-prefixed instruction. I think this causes a pipeline stall.
If we instead do: mov %rsp, rsp_scratch(%rip) mov sp0(%rip), %rsp) swapgs ... pushq rsp_scratch(%rip) then we avoid the stall and save about three cycles. Horrible horrible code to do this lives here: https://git.kernel.org/cgit/linux/kernel/git/luto/devel.git/log/?h=x86/faster_syscalls Caveat emptor: it also disables SMP. For three cycles, I don't think this is worth trying to clean up. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/