+ Linus. On Fri, Mar 27, 2015 at 03:25:47PM +0100, Denys Vlasenko wrote: > Hi, > > While running some tests I noticed that EFLAGS > is not saved across syscalls if I use 32-bit > userspace, use SYSENTER, and paravirt is active. > > Looking at the code, it's actually clear why that happens. > > /* > * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs. > * IF and VM in rflags are cleared (IOW: interrupts are off). > * SYSENTER does not save anything on the stack, > * and does not save old rip (!!!) and rflags. > */ > ENTRY(ia32_sysenter_target) > SWAPGS_UNSAFE_STACK <============================ > movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp > ENABLE_INTERRUPTS(CLBR_NONE) > > movl %ebp, %ebp > movl %eax, %eax > movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d > > /* Construct struct pt_regs on stack */ > pushq_cfi $__USER32_DS /* pt_regs->ss */ > pushq_cfi %rbp /* pt_regs->sp */ > CFI_REL_OFFSET rsp,0 > pushfq_cfi /* pt_regs->flags */ > > The SWAPGS_UNSAFE_STACK, it's it involves paravirt callbacks, > will change EFLAGS, and it *can't* save/restore them - > there is no place to save it, since neither stack nor > PER_CPU() is usable at that point. > > Interestingly, *no one ever complained*! > > Apparently, users *don't* depend on arithmetic flags > to survive over syscall. They also okay with DF flag > being cleared. > > Let's go flag-by-flag. > > ID - probably no one depends on it
It is used as a toggle to detect CPUID support. Can a SYSENTER happen while something toggles it? Probably... > VIP,VIF,VM - v86 stuff, not supported in 64bit > AC - someone probably do use this > RF - should be cleared to 0 > NT - iret via task gate, not supported in 64bit > IOPL - usually 00, sys_iopl() can change it > DF - according to C ABI, should be 0 > IF - should be preserved (but almost always 1) > TF - should be preserved > arith flags - probably no one cares > > IOW. Bits to be preseved are only AC, IOPL, TF, and _maybe_ > IF. > > AC and IOPL are preserved even with this paravirt quirk > because paravirt hooks do not mangle them. > > TF preservation and proper restoration is handled by > do_debug + syscall_trace_enter_phase2 + iret > combo. > > We unconditionally set IF. This is only a problem for applications > which use sys_iopl(3) and, disable IRQs in userspace and perform > syscalls. The set of such apps is probably empty. > (This "bug" exists even for non-paravirt case). > > So, formally, we have a bug: we do not preserve IF, > DF and arith flags. > > I'm proposing to use this opportunity to amend syscall ABI > to say that arith flags are not preserved across syscalls, > and DF can be cleared to 0 by syscalls (but can't be set to 1). > Evidently, it's broken for some time for some virtualized > setups and users are okay. > > I'm not sure what to do with the "bug" of forcing IF=1. > Fix it? Or also declare that syscalls can set IF=1? > Do you think this is a legitimate userspace code? > > sys_iopl(3); > cli; > syscall(); > /* expects irqs still disabled */ > > -- > vda > -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/