AMD docs say that SYSRET32 loads %ss selector with a value from a MSR, but *cached descriptor* of %ss is not modified. (Intel CPUs reset the descriptor to a fixed, valid state).
It was observed to cause Wine crashes. Conjectured sequence of events causing it is as follows: 1. Wine process enters kernel via syscall insn. 2. Context switch to any other task. 3. Interrupt or exception happens, CPU loads %ss with 0. (This happens according to both Intel and AMD docs.) %ss cached descriptor is set to "invalid" state. 4. Context switch back to Wine. 5. sysret to 32-bit userspace. %ss selector has correct value but its cached descriptor is still invalid. 6. The very first userspace POP insn after this causes exception 12. Fix this by checking %ss selector value. If it is not __KERNEL_DS, (and it really can only be __KERNEL_DS or zero), then load it with __KERNEL_DS. We also use SYSRET32 for SYSENTER-based syscalls, but that codepath is only used by Intel CPUs, which don't have this quirk. Signed-off-by: Denys Vlasenko <dvlas...@redhat.com> Reported-by: Brian Gerst <brge...@gmail.com> CC: Brian Gerst <brge...@gmail.com> CC: Linus Torvalds <torva...@linux-foundation.org> CC: Steven Rostedt <rost...@goodmis.org> CC: Ingo Molnar <mi...@kernel.org> CC: Borislav Petkov <b...@alien8.de> CC: "H. Peter Anvin" <h...@zytor.com> CC: Andy Lutomirski <l...@amacapital.net> CC: Oleg Nesterov <o...@redhat.com> CC: Frederic Weisbecker <fweis...@gmail.com> CC: Alexei Starovoitov <a...@plumgrid.com> CC: Will Drewry <w...@chromium.org> CC: Kees Cook <keesc...@chromium.org> CC: x...@kernel.org CC: linux-kernel@vger.kernel.org --- arch/x86/ia32/ia32entry.S | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 0c302d0..9537dcb 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -408,6 +408,18 @@ cstar_dispatch: sysretl_from_sys_call: andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS) RESTORE_RSI_RDI_RDX + /* + * On AMD, SYSRET32 loads %ss selector, but does not modify its + * cached descriptor; and in kernel, %ss can be loaded with 0, + * setting cached descriptor to "invalid". This has no effect on + * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail. + * Fix %ss only if it's wrong: read from %ss takes ~2 cycles, + * write to %ss is ~40 cycles. + */ + movl %ss, %ecx + cmpl $__KERNEL_DS, %ecx + jne reload_ss +ss_is_good: movl RIP(%rsp),%ecx CFI_REGISTER rip,rcx movl EFLAGS(%rsp),%r11d @@ -426,6 +438,10 @@ sysretl_from_sys_call: * does not exist, it merely sets eflags.IF=1). */ USERGS_SYSRET32 +reload_ss: + movl $__KERNEL_DS, %ecx + movl %ecx, %ss + jmp ss_is_good #ifdef CONFIG_AUDITSYSCALL cstar_auditsys: -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/