On Fri, 24 Nov 2017, Ingo Molnar wrote:

> From: Andy Lutomirski <l...@kernel.org>
> 
> Handling SYSCALL is tricky: the SYSCALL handler is entered with every
> single register (except FLAGS), including RSP, live.  It somehow needs
> to set RSP to point to a valid stack, which means it needs to save the
> user RSP somewhere and find its own stack pointer.  The canonical way
> to do this is with SWAPGS, which lets us access percpu data using the
> %gs prefix.
> 
> With KAISER-like pagetable switching, this is problematic.  Without a
> scratch register, switching CR3 is impossible, so %gs-based percpu
> memory would need to be mapped in the user pagetables.  Doing that
> without information leaks is difficult or impossible.
> 
> Instead, use a different sneaky trick.  Map a copy of the first part
> of the SYSCALL asm at a different address for each CPU.  Now RIP
> varies depending on the CPU, so we can use RIP-relative memory access
> to access percpu memory.  By putting the relevant information (one
> scratch slot and the stack address) at a constant offset relative to
> RIP, we can make SYSCALL work without relying on %gs.

Smart!

> A nice thing about this approach is that we can easily switch it on
> and off if we want pagetable switching to be configurable.
> 
> The compat variant of SYSCALL doesn't have this problem in the first
> place -- there are plenty of scratch registers, since we don't care
> about preserving r8-r15.  This patch therefore doesn't touch SYSCALL32
> at all.
> 
> XXX: Whenever we settle how KAISER gets turned on and off, we should do
> the same to this.
> 
> Signed-off-by: Andy Lutomirski <l...@kernel.org>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>

Reply via email to